Balanced Search Trees

Binary search trees give O(log n) lookup, insert, delete — when balanced. Without rebalancing, a BST can degenerate to a linked list (O(n)).

Balanced search trees maintain logarithmic height through rebalancing operations. They're the workhorses of databases, language libraries, and ordered-data structures.

Why balance matters

A BST built by inserting sorted data degenerates:

All operations become O(n). Rebalancing keeps height O(log n).

AVL trees

The first balanced BST (1962). Strict balance: heights of subtrees differ by at most 1.

Operations

Insert: standard BST insert; rebalance up the path
Delete: standard BST delete; rebalance up the path
Lookup: O(log n)

Rebalancing uses rotations: single (LL, RR) or double (LR, RL).

Properties

Height: ≤ 1.44 × log₂(n)
Stricter balance than red-black
More rotations on insert/delete
Faster lookups than red-black (slightly shorter)

Use cases

Read-heavy workloads (frequent lookups)
Memory databases

Red-black trees

More relaxed balance. Each node is red or black; properties enforce approximate balance.

Properties

Root is black
Red nodes can't have red children
All paths from root to null have same number of black nodes

These rules guarantee height ≤ 2 × log₂(n+1).

Operations

Same as AVL: insert, delete, lookup. Rebalancing uses rotations and color flips.

Generally fewer rotations than AVL on writes.

Use cases

Java's TreeMap, TreeSet
C++'s std::map, std::set
Linux kernel (process scheduling, mmap)

The default ordered-map implementation in many standard libraries.

B-trees

Multi-way trees. Each node holds k keys and has k+1 children.

Designed for disk-based storage; adapted for in-memory use too.

Properties

All leaves at same depth
Each internal node has between m/2 and m children
m typically 100-1000 for disk-based; smaller for memory

Operations

Lookup: traverse, comparing keys at each node
Insert: insert at leaf; split if too full; propagate up
Delete: more complex; merge or borrow from siblings

B+ trees

All data in leaves; internal nodes have only keys. Leaves linked for efficient range scans.

Standard for relational databases (MySQL, PostgreSQL indices).

Use cases

Database indices
Filesystems (ext4, NTFS, btrfs)
Key-value stores (LevelDB-derived)

Splay trees

Self-adjusting tree. Recently accessed elements move to root.

Properties

No explicit balance
Amortized O(log n) operations
Specific access patterns can be O(log n) per operation

Use cases

Niche. Useful when access patterns are skewed (some keys much hotter).

Less common in production than AVL or red-black.

Treaps

Randomized BST. Each node has a key and a random priority. Tree is BST on keys, heap on priorities.

Properties

Expected O(log n) operations
Simple implementation
Good for concurrent operations (less coordination)

Skip lists

Probabilistic alternative. Multiple levels of linked lists; higher levels are "express lanes."

Properties

Expected O(log n) operations
Simpler than balanced BSTs
Good for concurrent access

Use cases

Redis (sorted sets)
LevelDB memtable
Some concurrent data structures

Choice in practice

For most language standard libraries: red-black trees.

For database indices on disk: B+ trees.

For sorted sets with frequent updates: skip lists or red-black.

Rarely should you implement these yourself. Use the standard library.

Comparison

	AVL	Red-Black	B-tree	Skip List
Lookup	Fast	Slightly slower	Slow per node, few nodes	Fast
Insert	Slower	Fast	Fast	Fast
Memory	Compact	Compact	Compact	More overhead
Cache friendly	Poor	Poor	Excellent	Poor
Concurrent	Hard	Hard	Hard	Easier

Key insight: cache locality

Modern CPUs make cache miss expensive. B-trees with high fan-out are cache-friendly: more comparisons per cache line.

Binary trees have poor cache behavior. This is why B-trees beat binary trees for in-memory ordered data too, despite higher complexity.

For pure performance on modern hardware, B-tree variants often beat AVL/red-black even in memory.

Operations performance

For all balanced BSTs:

Lookup: O(log n)
Insert: O(log n)
Delete: O(log n)
Min/max: O(log n)
Range query: O(log n + k) where k is range size
In-order traversal: O(n)

These are usually 2-3x slower than hash tables for individual lookups, but support ordered operations hash tables can't.

When to use a balanced BST vs hash table

Hash table wins

Order doesn't matter
Maximum lookup speed needed
No range queries

Balanced BST wins

Order matters
Range queries
Predecessor/successor queries
Worst-case bounds matter (hash tables have O(n) worst case)
Iterator stability across modifications

Common operations

Range queries

"Find all values in [a, b]"

In balanced BST: O(log n + k). Easy.

In hash table: O(n). Painful.

Predecessor / successor

"Largest value less than x"

In balanced BST: O(log n).

In hash table: O(n).

Order statistics

"What's the k-th smallest value?"

With augmentation (size info per node): O(log n).

Common failure patterns

Implementing your own

Bugs are subtle. Use the standard library.

Hash table when ordering matters

Realize too late that you need ordered iteration.

Balanced BST when hash table sufficient

Slower than hash table for simple lookups.

Ignoring cache effects

Choosing for asymptotic complexity when constant factors dominate at your scale.

Using the wrong balanced BST variant

Almost always the standard library default is fine.

Practical advice

For application code:

Use the standard library's ordered map (TreeMap, std::map, BTreeMap in Rust)
Don't worry about which balanced tree it uses
Benchmark if performance matters

For systems / database code:

B-tree variants are usually right
Concurrency considerations matter
Profile actual workloads

For learning:

Implement red-black or AVL once for understanding
Then use the standard library

Balanced Search Trees

Why balance matters

AVL trees

Operations

Properties

Use cases

Red-black trees

Properties

Operations

Use cases

B-trees

Properties

Operations

B+ trees

Use cases

Splay trees

Properties

Use cases

Treaps

Properties

Skip lists

Properties

Use cases

Choice in practice

Comparison

Key insight: cache locality

Operations performance

When to use a balanced BST vs hash table

Hash table wins

Balanced BST wins

Common operations

Range queries

Predecessor / successor

Order statistics

Common failure patterns

Implementing your own

Hash table when ordering matters

Balanced BST when hash table sufficient

Ignoring cache effects

Using the wrong balanced BST variant

Practical advice

Further Reading