Vector Databases: Indexing for Speed

Vector databases solve the "Nearest Neighbor" problem in high-dimensional space. While a brute-force search is $O(N)$, production systems use Approximate Nearest Neighbor (ANN) algorithms to achieve sub-millisecond latency.

1. Indexing Strategies: HNSW vs. IVF

Choosing an index determines the trade-off between memory, search speed, and accuracy (recall).

HNSW (Hierarchical Navigable Small World)

- **Mechanism:** Builds a multi-layered graph where the top layer has few connections (long jumps) and bottom layers have many (local refinement).

- **Pros:** Fast search, high recall, no training phase required.

- **Cons:** High memory consumption (stores the full graph); slow index builds.

- **Best For:** Real-time applications where accuracy is paramount.

IVF (Inverted File Index)

- **Mechanism:** Partitions the vector space into clusters (Voronoi cells). The search only explores the closest clusters.

- **Pros:** Low memory overhead; faster index builds than HNSW.

- **Cons:** Requires a "training" phase to define cluster centroids; lower recall if the dataset shifts.

- **Best For:** Massive datasets (billions of vectors) where memory is the bottleneck.

2. pgvector: Vector Search in Postgres

`pgvector` is the standard for adding vector search to existing relational databases. It supports two main index types:

Concrete Example: pgvector HNSW Index

```sql

-- 1. Enable the extension

CREATE EXTENSION IF NOT EXISTS vector;

-- 2. Create a table with a vector column (1536 dimensions)

CREATE TABLE documents (

id serial PRIMARY KEY,

content text,

embedding vector(1536)

);

-- 3. Create an HNSW index

-- m: max connections per node (default 16)

-- ef_construction: dynamic candidate list size (default 64)

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)

WITH (m = 16, ef_construction = 64);

-- 4. Perform a similarity search

SELECT content FROM documents

ORDER BY embedding <=> '[0.1, 0.2, ...]'

LIMIT 5;

```

*Note: `<=>` is the cosine distance operator in pgvector.*

3. Dedicated Vector Stores vs. SQL Extensions

| Feature | pgvector (Postgres) | Pinecone / Qdrant |

|---|---|---|

| **Data Consistency** | High (ACID compliant) | Variable (Eventual consistency) |

| **Complexity** | Low (Single DB) | High (Separate service) |

| **Hybrid Search** | Native (Join with SQL) | Requires complex orchestration |

| **Scale** | Millions of vectors | Billions of vectors |

4. Optimization: Product Quantization (PQ)

In dedicated stores like Qdrant or Milvus, you can apply **Product Quantization** during indexing to further reduce memory.

**Concrete Configuration (Qdrant):**

```yaml

indexing_threshold: 20000

optimizers_config:

memmap_threshold: 10000

quantization_config:

scalar:

type: int8

quantile: 0.99

always_ram: true

```

This configuration converts vectors to `int8` once the collection reaches 20,000 vectors, reducing RAM usage by 4x.

5. Reciprocal Rank Fusion (RRF)

For the best search results, combine vector search with keyword search (BM25) using RRF.

**Algorithm:**$$\text{Score}(d) = \sum_{r \in R} \frac{1}{k + \text{rank}(d, r)}$$Where$k$is a constant (typically 60) and$R$ is the set of rankings from different search methods. This ensures that a document appearing in the top 10 for *both* keyword and vector search is boosted to the absolute top.