Database
Ch-ch-changes!
"Time is what keeps everything from happening at once."
Ray Cummings
- At TigerBeetle, we apply defense-in-depth to our testing infrastructure. Our Deterministic Simulation Testing (DST) and fuzzers test a simulated TigerBeetle cluster where time can be sped up, while our non-deterministic generative test suite (Vörtex) tests a real TigerBeetle cluster using various client language drivers. Both assert cluster liveness and safety in the presence of process, network, and storage faults. The Continuous Fuzzing Orchestrator (CFO) runs DST 24x7 on 1024 cores, testing the latest version of the database. For anything that slips past, we have ~10K assertions enabled in production, as we would rather crash than corrupt!
- Last month, we reduced the performance impact of assertions in release builds, by guarding a few expensive assertions in our performance-critical data plane behind a flag. These assertions are still enabled in our DST and fuzzers, maintaining their efficacy.
- We prepared Vörtex to be run via the CFO. One discovery along the way was that both the CFO and Vörtex rely on unshare, which doesn’t support nesting. Consequently, we removed unshare usage from the CFO, instead relying on explicitly trapping the EXIT/INT/TERM signals.
- Additionally, we improved detection latency of a crashed replica in the Vörtex, bringing it down from ~20s to ~2s, enabling faster detection of safety violations.
- TigerBeetle’s MessageBus component handles connection management, and interfaces with the physical IO implementation (kqueue/io_uring) for sending and receiving messages between replicas and clients. We expanded the coverage of the MessageBus fuzzer to include suspension and resumption of messages, which is used by replicas to temporarily suspend received messages (as opposed to dropping them), if they don’t have enough IO capacity to write them to disk.
- TigerBeetle replicas perform online verification to detect and recover from latent sector errors (LSEs) – a phenomenon wherein the disk as a whole continues to function, but a small section of data is unavailable. Last month, we introduced offline verification via the tigerbeetle inspect integrity command, which allows the user to verify that a data file has no corruption.
- We added a new validation for all client libraries to ensure their latest published release matches the latest published TigerBeetle release on GitHub. This validation helps detect supply chain attacks wherein an attacker attempts to push a new, malware-infected version of the TigerBeetle client to a package manager (for example npm).
- Each TigerBeetle replica maintains an on-disk ring buffer called the write-ahead log (WAL), to which it writes user requests (called prepares). The WAL is kept consistent across replicas even in the face of process and network faults, using Viewstamped Replication, and storage faults, using Protocol-Aware Recovery. The WAL is finite, so committed prepares are flushed to a write-optimized data structure called the Log-Structured Merge Tree, which is composed of an in-memory mutable table and on-disk sorted tables.
- TigerBeetle’s LSM forest consists of ~30 LSM trees. Last month, we reduced memory utilization across the LSM forest by 300MiB by introducing a single buffer for our radix sort, shared across all in-memory mutable tables. Previously, each LSM tree’s mutable table held its own buffer.
- In the presence of faults, the WAL repair subprotocol detects missing or corrupted prepares and fetches them over the network from other TigerBeetle replicas. We made WAL repair more robust to changing network topology and latencies by implementing adaptive repair (addressing two fallacies of distributed computing). Specifically, replicas now maintain an exponentially weighted moving average of the repair latency they observe from each remote replica, and prioritize remote replicas that have previously responded with low repair latency.
- We effectively doubled the WAL size — without increasing storage utilization. Specifically, we garbage-collect the WAL more proactively, allowing replicas to accept prepares from the next log wrap if they replace already-committed ones.
- We also implemented some improvements and fixes to our client libraries and state machine business logic:
|
|
Community Contributions
Mohamed fixed the command to build TigerBeetle clients in HACKING.md, and also fixed some typos on our documentation’s concepts page. Thanks for your keen eye, Mohamed!
|
|
"Database television" (DTV)
Last month on IronBeetle, we live-coded an improvement in the coverage of the MessageBus fuzzer, the component which encapsulates the message passing logic between replicas and clients. Specifically, the fuzzer now tests suspension and resumption of messages. Additionally, we dove into TigerBeetle’s queries! First, we discussed the query engine interface – how query data flows through the TigerBeetle state-machine. Then, along with a special guest, we discussed query storage access – how query data is read from the in-memory and on-disk components of the LSM forest, for both point and range queries.
Join us live every week on Twitch or catch up on the TigerTube!
|
|
Looking back
Fixing five "two-year" bugs per day, Bug Bash Podcast (Oct 1)
Some bugs are so rare, they can take years to track down and fix. What if you could find and fix five of them per day? Joran was on Bug Bash Podcast to share how TigerBeetle makes finding and fixing impossible bugs something normal, and day-to-day.
Table Mountain Database Management Seminar (Oct 9)
An overnight hike — DBMS philosopher walk and talks, card games, South African braai under starry skies — with TigerBeetle, Prof. Philippe Bonnet, Prof. Peter Boncz, and Dominik Tornow. We had a wonderful time in Cape Town for the inaugural Table Mountain Database Management Seminar. Days we’ll remember!
|
|
|
Tracking Time Without Clock (Oct 21)
The problem with time is that it looks deceptively simple (just a syscall away), but is actually tricky to handle correctly in a reliable system! Matklad’s latest for our blog, on time.
P99 CONF: The Tale of Taming TigerBeetle's Tale (Oct 22)
TigerBeetle’s Tobi gave one of our most alliterative talks (to date) at p99 CONF. Also this month, Tobi took to the skies, learning to kitesurf alongside Prof. Peter Boncz, Georg, Peter, and Joran.
A Million Transactions per Second: Building TigerBeetle (Oct 24)
Sharing the same major in accounting plus a love for databases, Aaron Francis had a delightful moment explaining debit/credit in this conversation with Joran on building TigerBeetle, why the world needs OLTP, and un(row)locking a million transactions per second. Thanks, Aaron!
Synadia and TigerBeetle Pledge $512,000 to the Zig Software Foundation (Oct 25)
Together with Synadia, we made a pledge to the Zig Software Foundation, over the next two years, in support of the language, leadership, and communities building the future of simpler systems software.
|
|
Looking ahead
Around the… in… (December)
Keep an eye out for an announcement on Monday...
|
|
Thank you!
‘Till next time… “Você pega o trem azul, o Sol na cabeça”!
The TigerBeetle Team
|
|
|
|
|