B+ Trees: The Data Structure Powering Your Database Indexes

Jun 11, 2026Venkata Lokesh P
DBMSB+ TreeIndexingData StructuresDatabase Internals

Every time I used a database index, I took it for granted. Then I dug into how indexes actually work internally — and it all comes back to one data structure: the B+ Tree.

It sounds intimidating. It's not. Let me walk through it.


Why Not Just Use a Regular Sorted Array?

A sorted array lets you do binary search — fast lookups in O(log n). But inserting into a sorted array is slow — you have to shift elements. And if the data is on disk (not in memory), random reads are expensive.

Databases need something that:

  • Stays sorted for fast lookups.
  • Handles inserts/deletes efficiently without massive reorganization.
  • Works well with disk reads (reads in chunks, not one value at a time).

That's exactly what B+ trees are designed for.


What Is a B+ Tree?

A B+ Tree is a self-balancing tree where:

  • Every path from the root to a leaf is the same length (always balanced).
  • Each node holds multiple keys — so the tree stays shallow even with millions of entries.
  • All actual data (or row pointers) are stored only at the leaf nodes.
  • The internal (non-leaf) nodes only hold keys as guides to navigate down the tree.
  • Leaf nodes are linked together in a doubly-linked list — making range scans fast.

The Structure

Let's say we have user IDs: 1, 5, 9, 14, 17, 21, 30 indexed in a B+ tree with order 3 (max 2 keys per node):

CODE
                  [ 14 ]
                 /       \
          [ 5, 9 ]       [ 17, 21 ]
          /   |   \       /    |    \
        [1,5] [9] [14] [17] [21] [30]
        ←→    ←→   ←→   ←→   ←→   ←→   (leaf nodes linked)

Internal Nodes (top levels)

  • Act as a routing guide: "go left if value < 14, right if ≥ 14."
  • Do NOT store actual data — just keys for navigation.

Leaf Nodes (bottom level)

  • Store the actual values (or pointers to rows in the table).
  • Are linked to each other left-to-right — this is the key feature for range queries.

How a Lookup Works

Find user with ID = 17:

  1. Start at root: 17 ≥ 14 → go right.
  2. At node [17, 21]: 17 = 17 → go to left child.
  3. At leaf [17]: found it. Return the row pointer.

That's it — 3 steps for a 7-element tree. For millions of rows, it's still just ~30 steps (since the tree height grows as log of the number of entries).


How a Range Query Works

This is where B+ trees absolutely shine over regular B-trees.

Query: WHERE user_id BETWEEN 9 AND 21

  1. Traverse tree to find the leaf containing 9.
  2. From there, follow the linked list across leaf nodes: [9] → [14] → [17] → [21].
  3. Stop when you exceed the upper bound.

Because leaf nodes are linked, you never have to go back up the tree. You just scan horizontally across the bottom — super efficient.

This is why B+ trees are better than B-trees for databases: in a regular B-tree, data lives at every level (not just leaves), so range queries require traversing up and down the tree. B+ trees keep all data at the leaves and link them — range scans become a simple linear walk.


How Inserts Work

When you insert a new value, the tree might need to split a node to stay balanced:

  1. Find the correct leaf node for the new value.
  2. Insert it there.
  3. If the leaf is full (exceeds max keys), split it into two nodes and push the middle key up to the parent.
  4. If the parent is also full, split that too — this can ripple all the way up to the root.
  5. If even the root splits, a new root is created — this is the only way the tree grows taller.

This is what keeps the tree balanced — every leaf is always at the same depth.


B-Tree vs B+ Tree

B-TreeB+ Tree
Data stored atEvery node (internal + leaf)Only leaf nodes
Leaf nodes linked?NoYes (doubly linked list)
Range queriesSlower (traverse up/down)Fast (scan leaf list)
Space efficiencySlightly better per lookupBetter overall for databases
Used in databases?RarelyYes — PostgreSQL, MySQL, SQLite, etc.

Why Databases Love B+ Trees

  • Shallow height — even billions of rows fit in a tree that's only 4-5 levels deep. That means at most 4-5 disk reads per lookup.
  • Disk-friendly — nodes are sized to match disk pages (usually 4KB or 8KB), so each node read is one I/O operation.
  • Range-scan optimized — the linked leaf list makes BETWEEN, >, <, and ORDER BY extremely efficient.
  • Always balanced — inserts/deletes never leave the tree unbalanced, so worst-case lookup is always predictable.

Next time you add an index to a column and your query goes from 5 seconds to 5 milliseconds — that's a B+ tree doing its job. Understanding this structure gives you intuition for why indexes work the way they do.


B+ trees are one of those foundational ideas where once you understand them, a lot of other database concepts (clustered indexes, range scans, why primary keys should be sequential) just start making sense naturally.