B+ Trees: The Engine of Modern Database Storage

Atomic Answer: A B+ Tree is an advanced, self-balancing tree data structure that maintains sorted data. By storing data exclusively in linked leaf nodes and retaining only routing keys in internal nodes, it achieves a massive fan-out. This design ensures minimal disk I/O, making it the industry standard for modern database and file system indexing.

The B+ Tree (B-Plus Tree) is a highly efficient, self-balancing tree data structure designed to maintain sorted data. It allows for extremely fast insertions, deletions, and searches.

It is an evolution of the traditional B-Tree, specifically optimized for environments where data is stored on disk rather than in memory.

Today, the B+ Tree is the undisputed industry standard for database indexing. It powers the storage engines of almost all major relational database management systems (RDBMS), including:

PostgreSQL
MySQL
SQL Server
Oracle
Modern file systems like NTFS and XFS

1. Core Structural Properties

Atomic Answer: The B+ Tree achieves its exceptional performance through three structural pillars: data is stored exclusively at the leaf nodes, leaf nodes are linked sequentially, and internal nodes maintain a high fan-out. This wide, shallow architecture guarantees rapid, sequential access and keeps tree height minimal, optimizing disk reads.

To understand why the B+ Tree is so ubiquitous, it is crucial to examine its structural properties and how it differentiates itself from its predecessor, the B-Tree.

A B+ Tree is defined by three critical design choices:

Data Exclusively at Leaf Nodes: Internal (non-leaf) nodes only store "routing keys" used strictly to direct the search path. The actual data records, or pointers to the physical data rows (Record IDs or RIDs), are stored entirely within the leaf nodes.
Linked Leaf Nodes: All leaf nodes are linked together, typically forming a doubly-linked list. This forms a sequential chain of all the data in the tree, sorted by the index key.
High Fan-out: Because internal nodes do not need to reserve space for bulky data payloads, they can fit many more keys and child pointers per node. This results in a massive "fan-out" factor, producing a very wide and extremely shallow tree. A B+ Tree indexing billions of rows may be only 3 or 4 levels deep.

2. Anatomy of a B+ Tree

Atomic Answer: A B+ Tree consists of internal nodes and leaf nodes aligned with disk pages. Internal nodes act as routing hubs, directing searches using keys and child pointers. Leaf nodes, all situated at the same depth, store the actual data or record pointers and facilitate fast sequential scanning.

A B+ tree is constructed using blocks of memory called "pages" or "nodes." These usually map perfectly to the underlying disk block size (e.g., 4KB or 8KB) to optimize I/O operations.

Internal Nodes

Internal nodes act as traffic cops.
Each internal node contains an array of keys and an array of pointers to child nodes.
For any given key $K_i$ in an internal node, all keys in the left child subtree are less than $K_i$ , and all keys in the right child subtree are greater than or equal to $K_i$ .

Leaf Nodes

Leaf nodes sit at the very bottom of the tree.
Every leaf node is at the exact same depth, guaranteeing that any exact-match search requires the exact same number of steps.
A leaf node contains key-value pairs where the value is either the actual row data (in a clustered index) or a pointer to the row on disk (in an unclustered index).

3. Operations and Algorithms

Atomic Answer: B+ Trees execute efficient search, insertion, and deletion by balancing the tree recursively. Searches rely on internal routing to reach leaves, while range queries simply scan the linked leaves. Insertions split full nodes upward, and deletions merge or borrow from siblings to maintain node occupancy invariants.

Search and Range Queries

The search operation in a B+ Tree begins at the root and traverses down by comparing the target key against the routing keys in each internal node. Once it reaches the correct leaf node, it scans the leaf to find the exact match.

The true power of the B+ Tree shines during Range Queries (e.g., SELECT * FROM employees WHERE salary BETWEEN 50000 AND 80000).

A traditional B-Tree requires a complex, zig-zagging in-order traversal up and down the tree branches.
A B+ Tree simply performs an exact-match search for the start of the range (50000).
It finds the corresponding leaf node and walks the linked list of leaves sequentially until it hits the end of the range (80000).
This turns a random-access nightmare into a smooth, sequential disk read.

Insertion (Splitting)

When inserting a new key, the tree must maintain its balanced structure and node capacity constraints (usually a minimum 50% occupancy).

Locate Leaf: The algorithm traverses down to find the appropriate leaf node.
Insert: If the leaf has available space, the key is inserted in sorted order.
Split (Overflow): If the leaf is full, it splits into two separate leaf nodes. The entries are divided evenly between the two.
Copy-up: To ensure the parent node can route to the newly created leaf, the smallest key of the new right-hand leaf is "copied up" to the parent node.
Recursive Split: If the parent node is also full, it splits as well. When internal nodes split, the median key is "pushed up" (moved, rather than copied) to its parent. This split can propagate all the way up to the root, which is the only way a B+ Tree grows in height.

Deletion (Merging and Redistribution)

Deletion is slightly more complex, as removing keys might cause a node to become too empty (underflow), violating the tree's invariants.

Locate and Remove: Find the leaf node and delete the key-value pair.
Borrow (Redistribution): If the leaf falls below its minimum capacity, it first looks to its immediate left or right sibling. If a sibling has extra keys, it "borrows" a key to restore balance, updating the parent's routing key accordingly.
Merge: If neither sibling can spare a key, the node merges with one of its siblings.
Pull-down: Merging two nodes removes the need for the routing key that separated them in the parent node. That routing key is pulled down and discarded. If this causes the parent to underflow, the merging process recursively propagates upwards. If the root node loses its last key, its only child becomes the new root, decreasing the tree's height.

4. Why B+ Trees Win on Disk

Atomic Answer: B+ Trees overcome disk I/O bottlenecks by maximizing cache locality and minimizing disk reads. Their high fan-out keeps the tree shallow, allowing upper levels to reside in RAM. Consequently, finding a specific record among billions typically requires only one or two actual disk fetch operations.

The primary bottleneck in modern databases is Disk I/O. Accessing data on a spinning hard drive or even an SSD is orders of magnitude slower than accessing RAM.

Data structures optimized for in-memory operations (like Red-Black Trees or Hash Tables) often perform poorly on disk because they scatter data randomly, leading to frequent, expensive disk reads.

B+ Trees conquer the I/O bottleneck by maximizing Cache Locality:

Node sizes are aligned with disk page sizes, so every "hop" down a B+ Tree corresponds to fetching exactly one page from disk.
The tree has a massive fan-out, meaning a table with a billion rows can be indexed by a B+ Tree only 4 levels deep.
The root and upper-level internal nodes are typically cached permanently in RAM.
Finding any specific record out of a billion usually requires only 1 or 2 actual disk I/O operations.

5. B+ Tree vs. B-Tree: A Summary

Atomic Answer: Unlike traditional B-Trees, B+ Trees store data strictly in leaf nodes and link them sequentially. This architectural shift significantly accelerates range queries through sequential scanning, increases fan-out for a shallower tree, and provides consistent search speeds by always routing to the leaf level.

Feature	B-Tree	B+ Tree
Data Location	Internal nodes and Leaf nodes	Leaf nodes exclusively
Internal Nodes	Keys and Data	Keys only (Routing logic)
Leaf Linking	Not linked	Linked via list
Range Queries	Slow (requires tree traversal)	Extremely Fast (sequential scan)
Fan-out/Height	Lower fan-out, deeper tree	Higher fan-out, shallower tree
Search Speed	Variable (can end early)	Consistent (always reaches leaf)

6. Emerging Contexts: Vector Databases

Atomic Answer: In modern AI applications, B+ Trees continue to index scalar metadata such as identifiers or timestamps within vector databases. However, they do not handle high-dimensional vector similarities, which are instead managed by specialized Approximate Nearest Neighbor (ANN) indexes like HNSW for semantic search capabilities.

While B+ Trees are the undisputed kings of scalar data (integers, strings, timestamps), they are not suitable for high-dimensional vector data used in modern AI applications.

In a vector database like pgvector:

B+ Trees are still heavily utilized.
They are relegated to indexing the metadata (e.g., document_id or creation_date).
Specialized Approximate Nearest Neighbor (ANN) indexes like HNSW handle the semantic vector similarity search.