HNSW (Hierarchical Navigable Small World)

HNSW is a graph-based algorithm for Approximate Nearest Neighbor (ANN) search. It solves the performance bottleneck of high-dimensional vector search by combining the properties of **Small World Networks** and **Skip Lists**.

1. Graph Mechanics: The Small World Property

In a Small World graph, most nodes can be reached from any other node in a very small number of hops. HNSW achieves this by maintaining:

- **High Local Clustering**: Nearby nodes are strongly connected.

- **Short Path Lengths**: Long-range edges allow the search to jump across the vector space quickly.

2. Layered Architecture (The Skip List Analogy)

HNSW organizes vectors into a hierarchy of layers:

- **Layer 0 (Bottom)**: Contains all vectors and fine-grained local connections.

- **Upper Layers**: Contain progressively fewer vectors. These act as "express lanes" for long-distance navigation.

The probability of a node appearing in a higher layer decreases exponentially, ensuring that upper layers stay sparse.

3. Search and Insertion

- **Search**: Starts at a random entry point in the topmost layer. It performs a greedy search to find the closest node in that layer, then drops to the corresponding node in the layer below and repeats until it reaches Layer 0.

- **Insertion**: A new node is assigned a maximum layer $L$. It is then inserted into all layers from $L$ down to 0, connecting to its $M$ nearest neighbors at each level using a diversity-aware heuristic.

4. Concrete Example: Building an HNSW Index with FAISS

FAISS (Facebook AI Similarity Search) is the standard library for production-grade HNSW implementations.

```python

import faiss

import numpy as np

Dimension of vectors (e.g., from a transformer model)

d = 768

n_vectors = 10000

Generate random vectors

data = np.random.random((n_vectors, d)).astype('float32')

query = np.random.random((1, d)).astype('float32')

HNSW Hyperparameters

M: number of neighbors per node

M = 32

Create the index

index = faiss.IndexHNSWFlat(d, M)

efConstruction: search depth during index building

index.hnsw.efConstruction = 40

efSearch: search depth during query time

index.hnsw.efSearch = 64

Add vectors to the index

index.add(data)

Perform the search (top 5 neighbors)

k = 5

distances, indices = index.search(query, k)

print(f"Nearest indices: {indices}")

print(f"Distances: {distances}")

```

5. Critical Hyperparameters

- **$M$**: Number of bidirectional links created for every new element during construction. Range 12–48. Higher $M$ increases recall and memory usage.

- **$efConstruction$**: The number of neighbors explored during index building. Higher values lead to better graph quality but slower build times.

- **$efSearch$**: The number of candidates tracked during query time. This can be tuned dynamically to trade off latency for accuracy.

Summary of Technical implementation added

- Explained the **Small World** and **Skip List** mathematical foundations.

- Detailed the **Layered Architecture** and its probabilistic distribution.

- Provided a complete **Python example using FAISS** to build and query an HNSW index.

- Defined the impact of key hyperparameters ($M$, $efConstruction$, $efSearch$).