Trie Data Structure

A trie (pronounced "try" or "tree" depending on which side of the religious war you're on) is a tree where each node represents a character / byte / element of a key, and the path from root to node spells the key. It's specifically good at problems where you want to do prefix queries — autocomplete, longest-prefix match, dictionary scans. Where it shines, nothing else comes close.

The structure, briefly

```

Insert "cat", "car", "dog":

(root)

/ \

c d

| |

a o

/ \ |

t r g

```

Each node holds (a) a marker indicating whether a key ends here, (b) child pointers / map per next character. Leaves can hold associated values (key → value mapping).

Two costs:

- **Memory** — each character is a node; for `n` keys of average length `L`, the trie has up to `n * L` nodes. Many implementations use child maps (HashMap, sparse array) to keep memory bounded.

- **Time** — insert / lookup / delete is `O(L)` where `L` is the key length. Independent of `n` (the number of keys). Worth contrasting with hash table's `O(L)` for hashing the key plus `O(1)` for the table lookup — for very long keys, hash table loses; for short keys, hash table wins.

When tries earn their keep

Prefix queries (autocomplete)

Given prefix "ap", find all keys starting with "ap." Trivial in a trie: descend to the "ap" node; all keys in its subtree are matches. `O(prefix length + matches)`.

In a hash table: impossible without scanning everything. In a sorted array: `O(log n + matches)` via binary search on prefix bounds. Trie wins on the worst case.

Production examples: search suggestions, command-line completion, IDE code completion (the simple kind, not the LLM kind).

Longest-prefix match (IP routing)

Given an IP address, find the most specific prefix in a routing table that matches. Routers do this on every packet.

A trie (specifically a "patricia trie" or "radix trie") solves this in `O(prefix length)`. Hash tables can't do "longest match"; they can only do exact match.

Production examples: IP routing tables (Linux's FIB), CIDR-based access control, geographic routing.

Spell check / dictionary lookup

For a misspelled word, find candidate corrections by exploring the trie within edit distance ≤ 2. The trie structure makes this efficient because branches that diverge from your word can be pruned early.

Production examples: spell checkers (aspell, hunspell), search query corrections.

String matching variants

- **Aho-Corasick** — given a set of patterns, find all occurrences in a text. Built on a trie + failure links. Linear time. Used in network intrusion detection, bioinformatics motif matching.

- **Suffix trie / suffix tree** — all suffixes of a string. Substring queries in `O(query length)` regardless of text length. Used in genome alignment, log analysis.

Variants

Radix trie / Patricia trie

Compresses chains of single-child nodes into one. Saves memory; insert / lookup is faster.

Linux uses a variant of this for the routing table (`fib_trie`).

Ternary search trie

Each node has three children (less, equal, greater). Cache-friendlier than a wide trie when the alphabet is large. Used in Apache Commons.

HAT-trie

Cache-conscious; combines tries with hash tables at the leaves. Faster than plain tries on real-world string sets.

DAFSA (Directed Acyclic Finite State Automaton)

Tries with shared subtree structure. Enormously space-efficient for natural-language dictionaries (English wordlist fits in ~100KB).

When tries lose

- **Random / non-prefix queries on equal-length keys.** Hash tables are simpler and faster.

- **Memory-constrained environments where keys are long.** Each character potentially adds a node; bytes per key is high.

- **Workloads where insert/delete dominate over lookup.** Maintaining tree structure has overhead.

- **Sparse data with very long keys.** Bloom filter or hash set is more compact for membership tests.

Memory layout matters

The naive C++/Java trie uses `Map<Character, Node>` per node. For ASCII text, prefer `Node[256]` arrays — direct indexing, no hash overhead. For Unicode strings, the array would be huge; use a sparse representation.

Cache-aware layouts pack frequently-accessed nodes together. Patricia tries / radix tries help; HAT-tries help further; DAFSAs maximally so.

In hot paths (e.g., per-packet routing), a radix-trie implementation laid out cache-friendly is dramatically faster than a generic hashmap-of-hashmap trie.

A simple Python implementation

```python

class Trie:

def __init__(self):

self.children = {}

self.value = None

self.is_end = False

def insert(self, key, value=None):

node = self

for c in key:

if c not in node.children:

node.children[c] = Trie()

node = node.children[c]

node.is_end = True

node.value = value

def search(self, key):

node = self._descend(key)

return node.value if node and node.is_end else None

def starts_with(self, prefix):

node = self._descend(prefix)

if node is None: return []

results = []

self._collect(node, prefix, results)

return results

def _descend(self, key):

node = self

for c in key:

if c not in node.children: return None

node = node.children[c]

return node

def _collect(self, node, prefix, results):

if node.is_end:

results.append((prefix, node.value))

for c, child in node.children.items():

self._collect(child, prefix + c, results)

```

Adequate for small data, autocomplete prototypes. Not optimised for memory or cache; use a library for production scale.

Production libraries

- **`pygtrie`** (Python) — pure Python; flexible.

- **`marisa-trie`** (Python wrapper around C++ MARISA) — read-only after construction; very compact.

- **`datrie`** (Python wrapper around libdatrie) — double-array trie; fast.

- **`hat-trie`** (C++) — cache-conscious; high-performance.

- **`patricia_trie`** (Rust crate) — radix trie.

- **Java's `TreeMap` with prefix range queries** — not exactly a trie, but covers the prefix-query case for sorted strings.

For production: pick a library that implements the variant matching your access pattern. Don't write your own unless you have a specific reason.

Concrete real-world examples

- **Linux kernel `fib_trie`** — IPv4/IPv6 routing decisions on every packet.

- **DNS resolvers** — caching results keyed by domain name.

- **Auto-complete in search engines** — typeahead suggestions.

- **Suffix arrays / FM-index in bioinformatics** — genome alignment.

- **Trie-based log indexing** — variant of prefix matching for distributed log analysis.

- **Aho-Corasick in Snort / Suricata** — pattern matching across many attack signatures.

If you're working on networking, search, bioinformatics, or text-processing, you'll meet tries in production.

Further reading

- [DataStructures]() — broader data structure context

- [BloomFilters]() — alternative for membership tests

- [StringMatchingAlgorithms]() — Aho-Corasick and friends

- [DatabaseIndexingStrategies]() — when to reach for trie-flavour indexes