Fast B-Trees

130 points20 comments9 hours ago
BeeOnRope

To answer a question implied in the article, per-lookup timing with rdtscp hurts the hash more than the btree for the same reason the hash is hurt by the data-depending chaining: rdtscp is an execution barrier which prevents successive lookups from overlapping. rdtsc (no p) isn't, and would probably produce quite different timings.

That the btree doesn't benefit from overlapping adjacent lookup/inserts is intereting.

I suppose it is because btree access (here) involves data-dependent branches, and so with random access you'll get about L mispredicts per lookup in an L-deep tree, so adjacent lookups are separated by at least one mispredict: so adjacent lookup can overlap execution, but the overlapping is useless since everything beyond the next mispredict is useless as it is on the bad path.

That's probably at least true for the small map regime. For the larger maps, the next iteration is actually very useful, even if on a mispredicted path, because the date accesses are at the right location so it serves to bring in all the nodes for the next iteration. This matters a lot outside of L2. At 5 instructions per comparison and 32-element nodes, however, there are just so many instructions in the window for 1 lookup it's hard to make it to the next iteration.

So b-trees benefit a lot from a tight linear seach (e.g. 2 instructions per check, macro-fused to 1 op), or a branch-free linear search, or far better than those for big nodes, a vectorized branch-free search.

show comments
aidenn0

It's been a while since I last tried things, but I found crit-bit trees[1] to be much faster than b-trees. Hash array-mapped tries are also good if you don't need the things that trees give you (e.g. in-order traversal, get all values in a certain range).

1: https://cr.yp.to/critbit.html

orlp

Why was Rust's hashmap only tested with SipHash? It's known to be pretty bad for performance.

I'm biased as the author of course, but try adding a benchmark with the Rust hasher + foldhash as well: https://github.com/orlp/foldhash.

show comments
josephg

I'd be curious to see how performance would change from storing b-tree entries in a semi-sorted array, and applying various other optimizations from here:

https://en.algorithmica.org/hpc/data-structures/b-tree/

The aggregate performance improvements Sergey Slotin gets from applying various "tricks" is insane.

show comments
kibo_money

Very interesting ! You mentioned the memory usage at the end, BTreeMaps are actually better than HashMaps most of the time, at least for Rust

Here's a good break down: https://ntietz.com/blog/rust-hashmap-overhead/

ww520

Adaptive radix tree is pretty good as well, with support for in order listing and range query. It can beat b-tree and come closely behind hashmap.

ur-whale

Unless I'm missing something, title of the article doesn't really correlate with its conclusion.

tekknolagi

I thought a lot of b(+)tree advantage was in bigger-than-RAM something or other for large databases and these benchmarks seem relatively small in comparison

show comments
nialv7

I feel I missed point of this article. I thought the author is trying to prove that b-tren isn't that bad compared to hashmaps. But taking 2~3x longer looks pretty bad.

If I need predictable ordering (but not actually sorting the keys) I will use something like indexmap, not b-tree.

lsb

Clojure, for example, uses Hash Array Mapped Tries as its associative data structure, and those work well