Hacker News

B-trees and database indexes (2024)

71 points by tosh - 27 comments

bddicken [3 hidden]5 mins ago

Oh hey, I wrote this! Happy to chat more about the article here. Databases are kinda my thing.

amarant [3 hidden]5 mins ago

Thanks for writing this! The visualisations really drive a better understanding than pure text does, and it's quite clear that you have a better understanding of what database do under the hood than I do.

As such, I have a question for you: contrary to your article, I've always been taught that random primary keys are better than sequential ones. The reason for this, I was told, was to avoid "hotspots". I guess it only really applies once sharding comes into play, and perhaps also only if your primary key is your sharding key, but I think that's a pretty common setup.

I'm not really sure how to formulate a concrete question here, I guess I would like to hear your thoughts on any tradeoffs on sequential Vs random keys in sharded setups? Is there a case there random keys are valid, or have I been taught nonsense?

bddicken [3 hidden]5 mins ago

B+trees combined with sequential IDs are great for writes. This is because we are essentially just appending new rows to the "linked list" at the bottom level of the tree. We can also keep a high fill % if we know there isn't a lot of data churn.

If you're sharding based purely on sequential ID ranges, then yes this is a problem. Its better practice to shard based on a hash of your ID, so sequential id assignments turn into non-sequential shard keys, keeping things evenly distributed.

amarant [3 hidden]5 mins ago

Oh wow, that's a super simple solution, and I can immediately see how this gets you the best of both worlds!

And since it's only used for speedy lookup we can even use a fast, cheap and non-secure hashing algorithm, so it's really a low-cost operation!

Thanks! This was really one of those aha-moments where I feel kinda stupid to not have thought of it myself!

bddicken [3 hidden]5 mins ago

I've also written about sharding.

https://planetscale.com/blog/database-sharding

amarant [3 hidden]5 mins ago

Thanks! Another great article! It strikes me that modulo sharding on a sequential id would probably work rather well, but it was not mentioned in this article. Is there a reason I'm not seeing that this is bad? I guess resharding might be problematic, as you can't easily split a shard in two without rewriting every shard if you do that...

cogman10 [3 hidden]5 mins ago

For our DBs (which are often unsharded), we've found the best performance using the user account ID as the first part of the cluster key and then a sequential id for whatever the record is as the second.

It's not as good as just a sequential ID at keeping the fragmentation and data movement down. However, it does ultimately lead to the best write performance for us because the user data ends up likely still appending to an empty page. It allows for more concurrent writes to the same table because they aren't all fighting over that end page.

UUIDv4 is madness.

traderj0e [3 hidden]5 mins ago

Spanner in particular wants random primary keys. But there are sharded DBMSes that still use sequential PKs, like Citus. There are also some use cases for semi-sequential PKs like uuid7.

bddicken [3 hidden]5 mins ago

What about spanner specifically benefits from random ids over sequential ones?

mamcx [3 hidden]5 mins ago

I remember this article for when I was researching for https://spacetimedb.com/. The interactivity is very cool, BTW!

One neat realization is that a database is in fact more about indexes than the actual raw tables (all things interesting work under this assumption), to the point that implementing the engine you get the impression that everything start with "CREATE INDEX" than "CREATE TABLE". This includes sequential scans, where as visualized in your article show that lay the data sequentially is in fact a form of index.

Now, I have the dream of make a engine more into this vision...

game_the0ry [3 hidden]5 mins ago

This has been post before, but planetscale also has a great sql for developers course:

https://planetscale.com/learn/courses/mysql-for-developers

traderj0e [3 hidden]5 mins ago

I've known for a long time that you usually want b-tree in Postgres/MySQL, but never understood too well how those actually work. This is the best explanation so far.

Also, for some reason there have been lots of HN articles incorrectly advising people to use uuid4 or v7 PKs with Postgres. Somehow this is the first time I've seen one say to just use serial.

bddicken [3 hidden]5 mins ago

Simple sequential IDs are great. If you want UUID, v7 is the way to go since it maintains sequential ordering.

jwpapi [3 hidden]5 mins ago

Does all of that apply to Postgresql as well or only Mysql?

daneel_w [3 hidden]5 mins ago

"The deeper the tree, the slower it is to look up elements. Thus, we want shallow trees for our databases!"

With composite indices in InnoDB it's even more important to keep the tree streamlined and let it fan out according to data cardinality: https://news.ycombinator.com/item?id=34404641

whartung [3 hidden]5 mins ago

I keep hearing about the downside of B(+)-Trees for DBs, that they have issues for certain scenarios, but I've never seen a simple, detailed list about them, what they are, and the scenarios they perform badly in.

bddicken [3 hidden]5 mins ago

It's really just a matter of tradeoffs. B-trees are great, but are better suited for high read % and medium/low write volume. In the opposite case, things like LSMs are typically better suited.

If you want a comprehensive resource, I'd recommend reading either Designing Data Intensive Applications (Kleppman) or Database Internals (Petrov). Both have chapters on B-trees and LSMs.

daneel_w [3 hidden]5 mins ago

See my comment in the main thread for an example. In a worst case scenario, some data is simply too "frizzy" to index/search efficiently and with good performance in a B-tree.

Retr0id [3 hidden]5 mins ago

For pure write throughput, LSM trees tend to beat btrees.

bddicken [3 hidden]5 mins ago

threatofrain [3 hidden]5 mins ago

Also curious to hear what people think of Bf-tree.

  https://vldb.org/pvldb/vol17/p3442-hao.pdf
  https://github.com/microsoft/bf-tree

bddicken [3 hidden]5 mins ago

I've read this paper and it's a neat idea. It hasn't been introduced into popular oss databases like postgres and mysql, and my understanding is it has some drawbacks for real prod use vs ths simplistic benchmarks presented in the paper.

Would love to know if anyones built something using it outside of academic testing.

jiveturkey [3 hidden]5 mins ago

> MySQL, arguably the world's most popular database management system,

bddicken [3 hidden]5 mins ago

It may not have the popularity it once did, but MySQL still powers a huge % of the internet.

traderj0e [3 hidden]5 mins ago

Is there a problem with that?

shawn_w [3 hidden]5 mins ago

Not the original commenter, but I thought sqlite had that title.

Retr0id [3 hidden]5 mins ago

sqlite is arguably not really a DBMS, just a DB