Vector Databases: Indexing for Speed

Vector databases solve the "Nearest Neighbor" problem in high-dimensional space. While a brute-force search is $O(N)$ , production systems use Approximate Nearest Neighbor (ANN) algorithms to achieve sub-millisecond latency.

1. Indexing Strategies: HNSW vs. IVF

Choosing an index determines the trade-off between memory, search speed, and accuracy (recall).

HNSW (Hierarchical Navigable Small World)

Mechanism: Builds a multi-layered graph where the top layer has few connections (long jumps) and bottom layers have many (local refinement).
Pros: Fast search, high recall, no training phase required.
Cons: High memory consumption (stores the full graph); slow index builds.
Best For: Real-time applications where accuracy is paramount.

IVF (Inverted File Index)

Mechanism: Partitions the vector space into clusters (Voronoi cells). The search only explores the closest clusters.
Pros: Low memory overhead; faster index builds than HNSW.
Cons: Requires a "training" phase to define cluster centroids; lower recall if the dataset shifts.
Best For: Massive datasets (billions of vectors) where memory is the bottleneck.

2. pgvector: Vector Search in Postgres

pgvector is the standard for adding vector search to existing relational databases. It supports two main index types:

Concrete Example: pgvector HNSW Index

-- 1. Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;

-- 2. Create a table with a vector column (1536 dimensions)
CREATE TABLE documents (
    id serial PRIMARY KEY,
    content text,
    embedding vector(1536)
);

-- 3. Create an HNSW index
-- m: max connections per node (default 16)
-- ef_construction: dynamic candidate list size (default 64)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- 4. Perform a similarity search
SELECT content FROM documents 
ORDER BY embedding <=> '[0.1, 0.2, ...]' 
LIMIT 5;

Note: <=> is the cosine distance operator in pgvector.

3. Dedicated Vector Stores vs. SQL Extensions

Feature	pgvector (Postgres)	Pinecone / Qdrant
Data Consistency	High (ACID compliant)	Variable (Eventual consistency)
Complexity	Low (Single DB)	High (Separate service)
Hybrid Search	Native (Join with SQL)	Requires complex orchestration
Scale	Millions of vectors	Billions of vectors

4. Optimization: Product Quantization (PQ)

In dedicated stores like Qdrant or Milvus, you can apply Product Quantization during indexing to further reduce memory.

Concrete Configuration (Qdrant):

indexing_threshold: 20000
optimizers_config:
  memmap_threshold: 10000
quantization_config:
  scalar:
    type: int8
    quantile: 0.99
    always_ram: true

This configuration converts vectors to int8 once the collection reaches 20,000 vectors, reducing RAM usage by 4x.

5. Reciprocal Rank Fusion (RRF)

For the best search results, combine vector search with keyword search (BM25) using RRF.

Algorithm:

\text{Score}(d) = \sum_{r \in R} \frac{1}{k + \text{rank}(d, r)}

Where $k$ is a constant (typically 60) and $R$ is the set of rankings from different search methods. This ensures that a document appearing in the top 10 for both keyword and vector search is boosted to the absolute top.