April in TigerLand

View this email in your browser

Dear friends,

We hope your April was ace. Last month, we moved from ring replication to star replication, reduced network bandwidth utilization, and wrote about dependency-toolchain compatibility. We started live-streaming the Pragmatics Of Consensus, hosted TigerHoops in the Bay, and announced Systems Distributed in Boston, July 27-28!

Let’s go!

Database

Ch-ch-changes!

"The best code is no code at all."
Jeff Atwood

Per TigerStyle, we design systems in terms of the four primary colors (network, storage, memory, compute) and their two textures (bandwidth, latency). For example, we chose ring replication to replicate user requests across multiple TigerBeetle replicas, so that the primary’s egress network bandwidth doesn’t become a bottleneck. However, textures may shift under new configurations…

Last month, we measured network latency in multi-region TigerBeetle deployments and found enough jitter that the primary needed to forward requests to multiple backups to keep tail latencies in check. We also discovered that replicas require more storage bandwidth than network bandwidth, by virtue of multiple secondary indexes. Consequently, we switched from adaptive replication routing to star replication, wherein the primary broadcasts requests to all backups. This reduced latencies across the board in multi-region deployments, with an order of magnitude improvement in P100 latency (from ~3s to ~250ms).

We observed that write-ahead-log repair was too eager, with backups trying to repair prepares before receiving them via replication. This was fixed and validated using deterministic performance testing: we now send ~30% fewer prepare messages and ~90% fewer request_prepare messages over the network, significantly reducing network bandwidth utilization.
The liveness mode in our deterministic simulator (the VOPR) tests whether replicas in a TigerBeetle cluster convert durability into availability as efficiently as they can. Put simply, liveness mode tests whether the repair protocol on each replica repairs missing and corrupted data using the intact data on other replicas. This protocol-aware testing goes beyond traditional system-level liveness testing, which typically involves checking whether the system as a whole is responsive. VOPR found some subtle livelocks in repair, which were fixed by introducing jitter into both journal and block repair protocols.
We renamed Viewstamped Replication’s messages for the view change subprotocol, which isolates a failed primary and establishes a new one. The new names better follow the view change flow: a backup exits the old view (StartViewChange → ExitView), joins the new one (DoViewChange → JoinView) under the new primary, who then announces the new view (StartView → View).
A latent bug was fixed in the flow where a replica sends itself a message, wherein the checksum of the request body was being populated before the request body itself was. For defense-in-depth, we now assert checksum validity when a replica receives a message.

TigerBeetle’s on-disk data is organized as a Log-Structured Merge (LSM) tree. LSM trees are write-optimized data structures that buffer inserted entries in memory and flush the buffer as a sorted run to disk. On disk, the LSM contains multiple levels of exponentially increasing capacity, where each level is composed of tables. TigerBeetle maintains an LSM forest, with separate trees for each of the primary indexes and 20+ secondary indexes.

The number of objects in each LSM tree is now tracked in the metrics.
When a level La is at capacity, LSM trees perform compaction: selecting a table from level La and merging it into level Lb. Last month, we improved the table-selection algorithm complexity from subquadratic O(|La| * log|Lb| ) to linear O(|La| + |Lb|). The optimization becomes effective in large deployments, where we observed a latency reduction from ~65ms to ~45ms.
Accounts & Transfers queries use the zig-zag merge join under the hood to efficiently find the intersection between multiple secondary indexes. In the common case, this algorithm skips large ranges of non-matching keys, processing far fewer results than the sum of input index sizes. Last month, the control flow of the zig-zag merge was simplified: we now avoid rebuilding the indexes multiple times, and exit fast if all indexes are waiting on IO.

TigerBeetle Time Based Identifiers (TBID) are time-based, lexicographically sortable IDs that allow applications to unlock LSM-level optimizations. TBIDs consist of 48 bits of timestamp and 80 bits of randomness. Previously, when the random bits overflowed, the client would panic. We improved this by incrementing the timestamp when the random bits overflow.

The recent surge in supply-chain attacks via GitHub Actions prompted us to pin our actions to fixed commit SHAs.

"Database television" (DTV)

Last month on IronBeetle, we live refactored CLI parsing: giving it its own arena allocator and improving the trailing argument API. We also discussed snapshot testing, a technique where tests auto-generate their expected output and self-update on failure. Finally, we piloted Pragmatics Of Consensus, a new mini-series focused on our implementation of Viewstamped Replication, featuring matklad and Tobi! We discussed sequencing, replication, as well as write-ahead-log repair, commit, and fault detection. There’s still lots of ground to cover… so hop on for the ride!

Join us live every Thursday at 5pm UTC on Twitch, YouTube and X!

Looking back

Triple-Win at Sonoma for Jonas in the TigerBeetle SR3, Apr 10-12
Jonas Axboe made a clean sweep at Sonoma Raceway in the Pro 1500 class, winning all three races! A tough weekend with a lot of rain, Jonas kept his cool, fast and consistent, at every turn. TigerBeetle is proud to partner. Next up: Atlanta in July!

Inaugural TU Munich Database Industry Retreat, Apr 13
Georg attended the first TU Munich Database Industry Retreat! Great insights into cutting-edge database research from the groups of Prof. Neumann, Prof. Giceva, and Prof. Leis.

Automation That Screams Joy, Apr 14
When working on larger software projects, one often needs to run a custom piece of automation software somewhere. We wrote about what to do (and what to avoid) when implementing custom automation that doesn’t fit CI.

Bugbash, Apr 23
Chaitanya presented on "Protocol-Aware Deterministic Simulation Testing", highlighting how TigerBeetle uses it to deeply test safety and liveness invariants across both the consensus protocol and storage engine. The talk will be published soon!

Exploring Rust Dependency-Toolchain Compatibility, Apr 24
brson's work on the TigerBeetle Rust client was undone when a point release broke our build—and this was his ensuing dependency-toolchain compatibility experiment.

TigerBeetle Would Rather Crash Than Guess, Apr 27
Gyan Prakash Karn went digging through TigerBeetle expecting clever tricks but ‘what (he) found was stranger: a database engineered to die on contact with ambiguity’. Here’s his take on why TigerBeetle would rather crash than guess (it’s true).

TigerHoops, Bay Area, Apr 29
Organized by Peter, the 4th TigerHoops took place in the Bay area last month: friendly runs, refreshments afterwards, midweek joy. Follow along on X for future announcements!

Looking ahead

Systems Distributed, Jul 27-28
Cape Town, New York, Amsterdam... Boston! The 4th Systems Distributed has been announced, and we're looking forward to time well spent together.

Speakers this year include Jens Axboe (Creator, io_uring), Andrew Kelley (Creator, Zig), and Heidi Howard (Author, Flexible Paxos). Presented by TigerBeetle, the conference teaches systems programming and systems thinking, with all proceeds after costs donated to the Zig Software Foundation.

Purchase your ticket and we'll see you in Boston!

The TigerVerse

Thank you!

‘Till next time… an interstellar burst!

The TigerBeetle Team