Trie Data Structure

A trie (pronounced "try" or "tree" depending on which side of the religious war you're on) is a tree where each node represents a character / byte / element of a key, and the path from root to node spells the key. It's specifically good at problems where you want to do prefix queries — autocomplete, longest-prefix match, dictionary scans. Where it shines, nothing else comes close.

The structure, briefly

Insert "cat", "car", "dog":

    (root)
    /    \
   c      d
   |      |
   a      o
  / \     |
 t   r    g

Each node holds (a) a marker indicating whether a key ends here, (b) child pointers / map per next character. Leaves can hold associated values (key → value mapping).

Two costs:

Memory — each character is a node; for n keys of average length L, the trie has up to n * L nodes. Many implementations use child maps (HashMap, sparse array) to keep memory bounded.
Time — insert / lookup / delete is O(L) where L is the key length. Independent of n (the number of keys). Worth contrasting with hash table's O(L) for hashing the key plus O(1) for the table lookup — for very long keys, hash table loses; for short keys, hash table wins.

When tries earn their keep

Prefix queries (autocomplete)

Given prefix "ap", find all keys starting with "ap." Trivial in a trie: descend to the "ap" node; all keys in its subtree are matches. O(prefix length + matches).

In a hash table: impossible without scanning everything. In a sorted array: O(log n + matches) via binary search on prefix bounds. Trie wins on the worst case.

Production examples: search suggestions, command-line completion, IDE code completion (the simple kind, not the LLM kind).

Longest-prefix match (IP routing)

Given an IP address, find the most specific prefix in a routing table that matches. Routers do this on every packet.

A trie (specifically a "patricia trie" or "radix trie") solves this in O(prefix length). Hash tables can't do "longest match"; they can only do exact match.

Production examples: IP routing tables (Linux's FIB), CIDR-based access control, geographic routing.

Spell check / dictionary lookup

For a misspelled word, find candidate corrections by exploring the trie within edit distance ≤ 2. The trie structure makes this efficient because branches that diverge from your word can be pruned early.

Production examples: spell checkers (aspell, hunspell), search query corrections.

String matching variants

Aho-Corasick — given a set of patterns, find all occurrences in a text. Built on a trie + failure links. Linear time. Used in network intrusion detection, bioinformatics motif matching.
Suffix trie / suffix tree — all suffixes of a string. Substring queries in O(query length) regardless of text length. Used in genome alignment, log analysis.

Variants

Radix trie / Patricia trie

Compresses chains of single-child nodes into one. Saves memory; insert / lookup is faster.

Linux uses a variant of this for the routing table (fib_trie).

Ternary search trie

Each node has three children (less, equal, greater). Cache-friendlier than a wide trie when the alphabet is large. Used in Apache Commons.

HAT-trie

Cache-conscious; combines tries with hash tables at the leaves. Faster than plain tries on real-world string sets.

DAFSA (Directed Acyclic Finite State Automaton)

Tries with shared subtree structure. Enormously space-efficient for natural-language dictionaries (English wordlist fits in ~100KB).

When tries lose

Random / non-prefix queries on equal-length keys. Hash tables are simpler and faster.
Memory-constrained environments where keys are long. Each character potentially adds a node; bytes per key is high.
Workloads where insert/delete dominate over lookup. Maintaining tree structure has overhead.
Sparse data with very long keys. Bloom filter or hash set is more compact for membership tests.

Memory layout matters

The naive C++/Java trie uses Map<Character, Node> per node. For ASCII text, prefer Node[256] arrays — direct indexing, no hash overhead. For Unicode strings, the array would be huge; use a sparse representation.

Cache-aware layouts pack frequently-accessed nodes together. Patricia tries / radix tries help; HAT-tries help further; DAFSAs maximally so.

In hot paths (e.g., per-packet routing), a radix-trie implementation laid out cache-friendly is dramatically faster than a generic hashmap-of-hashmap trie.

A simple Python implementation

class Trie:
    def __init__(self):
        self.children = {}
        self.value = None
        self.is_end = False

    def insert(self, key, value=None):
        node = self
        for c in key:
            if c not in node.children:
                node.children[c] = Trie()
            node = node.children[c]
        node.is_end = True
        node.value = value

    def search(self, key):
        node = self._descend(key)
        return node.value if node and node.is_end else None

    def starts_with(self, prefix):
        node = self._descend(prefix)
        if node is None: return []
        results = []
        self._collect(node, prefix, results)
        return results

    def _descend(self, key):
        node = self
        for c in key:
            if c not in node.children: return None
            node = node.children[c]
        return node

    def _collect(self, node, prefix, results):
        if node.is_end:
            results.append((prefix, node.value))
        for c, child in node.children.items():
            self._collect(child, prefix + c, results)

Adequate for small data, autocomplete prototypes. Not optimised for memory or cache; use a library for production scale.

Production libraries

pygtrie (Python) — pure Python; flexible.
marisa-trie (Python wrapper around C++ MARISA) — read-only after construction; very compact.
datrie (Python wrapper around libdatrie) — double-array trie; fast.
hat-trie (C++) — cache-conscious; high-performance.
patricia_trie (Rust crate) — radix trie.
Java's TreeMap with prefix range queries — not exactly a trie, but covers the prefix-query case for sorted strings.

For production: pick a library that implements the variant matching your access pattern. Don't write your own unless you have a specific reason.

Concrete real-world examples

Linux kernel fib_trie — IPv4/IPv6 routing decisions on every packet.
DNS resolvers — caching results keyed by domain name.
Auto-complete in search engines — typeahead suggestions.
Suffix arrays / FM-index in bioinformatics — genome alignment.
Trie-based log indexing — variant of prefix matching for distributed log analysis.
Aho-Corasick in Snort / Suricata — pattern matching across many attack signatures.

If you're working on networking, search, bioinformatics, or text-processing, you'll meet tries in production.