Balanced Search Trees
Binary search trees give O(log n) lookup, insert, delete — when balanced. Without rebalancing, a BST can degenerate to a linked list (O(n)).
Balanced search trees maintain logarithmic height through rebalancing operations. They're the workhorses of databases, language libraries, and ordered-data structures.
Why balance matters
A BST built by inserting sorted data degenerates:
```
1
\
2
\
3
\
4
```
All operations become O(n). Rebalancing keeps height O(log n).
AVL trees
The first balanced BST (1962). Strict balance: heights of subtrees differ by at most 1.
Operations
- Insert: standard BST insert; rebalance up the path
- Delete: standard BST delete; rebalance up the path
- Lookup: O(log n)
Rebalancing uses rotations: single (LL, RR) or double (LR, RL).
Properties
- Height: ≤ 1.44 × log₂(n)
- Stricter balance than red-black
- More rotations on insert/delete
- Faster lookups than red-black (slightly shorter)
Use cases
- Read-heavy workloads (frequent lookups)
- Memory databases
Red-black trees
More relaxed balance. Each node is red or black; properties enforce approximate balance.
Properties
- Root is black
- Red nodes can't have red children
- All paths from root to null have same number of black nodes
These rules guarantee height ≤ 2 × log₂(n+1).
Operations
Same as AVL: insert, delete, lookup. Rebalancing uses rotations and color flips.
Generally fewer rotations than AVL on writes.
Use cases
- Java's TreeMap, TreeSet
- C++'s std::map, std::set
- Linux kernel (process scheduling, mmap)
The default ordered-map implementation in many standard libraries.
B-trees
Multi-way trees. Each node holds k keys and has k+1 children.
Designed for disk-based storage; adapted for in-memory use too.
Properties
- All leaves at same depth
- Each internal node has between m/2 and m children
- m typically 100-1000 for disk-based; smaller for memory
Operations
- Lookup: traverse, comparing keys at each node
- Insert: insert at leaf; split if too full; propagate up
- Delete: more complex; merge or borrow from siblings
B+ trees
All data in leaves; internal nodes have only keys. Leaves linked for efficient range scans.
Standard for relational databases (MySQL, PostgreSQL indices).
Use cases
- Database indices
- Filesystems (ext4, NTFS, btrfs)
- Key-value stores (LevelDB-derived)
Splay trees
Self-adjusting tree. Recently accessed elements move to root.
Properties
- No explicit balance
- Amortized O(log n) operations
- Specific access patterns can be O(log n) per operation
Use cases
Niche. Useful when access patterns are skewed (some keys much hotter).
Less common in production than AVL or red-black.
Treaps
Randomized BST. Each node has a key and a random priority. Tree is BST on keys, heap on priorities.
Properties
- Expected O(log n) operations
- Simple implementation
- Good for concurrent operations (less coordination)
Skip lists
Probabilistic alternative. Multiple levels of linked lists; higher levels are "express lanes."
Properties
- Expected O(log n) operations
- Simpler than balanced BSTs
- Good for concurrent access
Use cases
- Redis (sorted sets)
- LevelDB memtable
- Some concurrent data structures
Choice in practice
For most language standard libraries: red-black trees.
For database indices on disk: B+ trees.
For sorted sets with frequent updates: skip lists or red-black.
Rarely should you implement these yourself. Use the standard library.
Comparison
| | AVL | Red-Black | B-tree | Skip List |
|---|---|---|---|---|
| Lookup | Fast | Slightly slower | Slow per node, few nodes | Fast |
| Insert | Slower | Fast | Fast | Fast |
| Memory | Compact | Compact | Compact | More overhead |
| Cache friendly | Poor | Poor | Excellent | Poor |
| Concurrent | Hard | Hard | Hard | Easier |
Key insight: cache locality
Modern CPUs make cache miss expensive. B-trees with high fan-out are cache-friendly: more comparisons per cache line.
Binary trees have poor cache behavior. This is why B-trees beat binary trees for in-memory ordered data too, despite higher complexity.
For pure performance on modern hardware, B-tree variants often beat AVL/red-black even in memory.
Operations performance
For all balanced BSTs:
- **Lookup**: O(log n)
- **Insert**: O(log n)
- **Delete**: O(log n)
- **Min/max**: O(log n)
- **Range query**: O(log n + k) where k is range size
- **In-order traversal**: O(n)
These are usually 2-3x slower than hash tables for individual lookups, but support ordered operations hash tables can't.
When to use a balanced BST vs hash table
Hash table wins
- Order doesn't matter
- Maximum lookup speed needed
- No range queries
Balanced BST wins
- Order matters
- Range queries
- Predecessor/successor queries
- Worst-case bounds matter (hash tables have O(n) worst case)
- Iterator stability across modifications
Common operations
Range queries
"Find all values in [a, b]"
In balanced BST: O(log n + k). Easy.
In hash table: O(n). Painful.
Predecessor / successor
"Largest value less than x"
In balanced BST: O(log n).
In hash table: O(n).
Order statistics
"What's the k-th smallest value?"
With augmentation (size info per node): O(log n).
Common failure patterns
Implementing your own
Bugs are subtle. Use the standard library.
Hash table when ordering matters
Realize too late that you need ordered iteration.
Balanced BST when hash table sufficient
Slower than hash table for simple lookups.
Ignoring cache effects
Choosing for asymptotic complexity when constant factors dominate at your scale.
Using the wrong balanced BST variant
Almost always the standard library default is fine.
Practical advice
For application code:
- Use the standard library's ordered map (TreeMap, std::map, BTreeMap in Rust)
- Don't worry about which balanced tree it uses
- Benchmark if performance matters
For systems / database code:
- B-tree variants are usually right
- Concurrency considerations matter
- Profile actual workloads
For learning:
- Implement red-black or AVL once for understanding
- Then use the standard library
Further Reading
- [HashTableDesign](HashTableDesign) — Alternative for unordered data
- [HeapAndPriorityQueues](HeapAndPriorityQueues) — Different tree structure
- [Data Structures Hub](DataStructuresHub) — Cluster index