Vector Indexing Internals: HNSW and IVF-PQ
Approximate Nearest Neighbor (ANN) search is the core engine of RAG systems. This article dissects the mathematical mechanics of the two dominant indexing strategies: **HNSW** (for performance) and **IVF-PQ** (for memory efficiency).
1. HNSW (Hierarchical Navigable Small World)
HNSW is a graph-based index. It organizes vectors into a hierarchy of layers where each layer is a "Small World" graph. Navigation starts at the sparse top layer and descends into denser layers to refine the search.
- **Pros**: Sub-millisecond latency, very high recall.
- **Cons**: High memory overhead (stores the full graph structure + raw vectors in RAM).
2. IVF-PQ (Inverted File with Product Quantization)
IVF-PQ is a hybrid approach that uses clustering (IVF) to narrow the search space and lossy compression (PQ) to reduce the memory footprint.
Inverted File (IVF)
The vector space is partitioned into $N$ clusters (Voronoi cells) using $k$-means.
- During search, only the most relevant $nprobe$ clusters are scanned.
Product Quantization (PQ)
PQ reduces the size of vectors (e.g., 1536 dimensions) by 10x–64x.
1. **Split**: The vector is divided into $m$ sub-vectors.
2. **Quantize**: Each sub-vector is replaced by the index of its closest centroid from a pre-trained codebook (usually 256 centroids per sub-space).
3. **Distance Computation**: Distances are approximated using a pre-calculated lookup table, avoiding expensive floating-point math.
Concrete Example: IVF-PQ with FAISS
```python
import faiss
import numpy as np
d = 128 # dimension
n_vectors = 100000
n_clusters = 100 # nlist (number of Voronoi cells)
m = 8 # number of sub-quantizers (each sub-vector is 128/8 = 16-dim)
nbits = 8 # each sub-vector is reduced to 8 bits (1 byte)
Training data for centroids
train_data = np.random.random((2048, d)).astype('float32')
data = np.random.random((n_vectors, d)).astype('float32')
Create the quantizer (using L2 distance)
quantizer = faiss.IndexFlatL2(d)
Create the IVF-PQ index
index = faiss.IndexIVFPQ(quantizer, d, n_clusters, m, nbits)
Train the index (k-means for centroids and PQ codebooks)
index.train(train_data)
index.add(data)
Tuning Search
index.nprobe = 10 # search 10 clusters instead of 1
Query
query = np.random.random((1, d)).astype('float32')
D, I = index.search(query, 5)
```
3. Comparison of Internals
| Feature | HNSW | IVF-PQ |
|---|---|---|
| **Mechanism** | Graph Traversal | Clustering + Quantization |
| **Memory usage** | High ($O(d \cdot N + edges)$) | Low ($O(m \cdot N)$) |
| **Accuracy (Recall)** | Very High | Moderate to High |
| **Latency** | Extremely Low | Low (increases with $nprobe$) |
| **Hardware** | Optimized for CPU | Optimized for GPU (massive parallelism) |
4. Distance Metrics
The choice of distance metric influences the internal search logic:
- **L2 (Euclidean)**: Measures straight-line distance.
- **Cosine Similarity**: Measures the angle between vectors (normalized dot product).
- **Inner Product**: Standard dot product. Used when vector magnitudes are significant.
Summary of Technical implementation added
- Dissected the **HNSW** and **IVF-PQ** mechanics.
- Explained the **Product Quantization (PQ)** workflow (Split, Quantize, Lookup).
- Provided a concrete **FAISS implementation of IndexIVFPQ**.
- Detailed the role of **nprobe** in balancing search speed vs. accuracy.
- Compared memory and latency trade-offs.