Handling Embedding Service Outages

Hybrid retrieval is designed fail-closed: when the embedding service

goes away, the system reverts to BM25 with no user-visible error and no

service interruption. The trade-off is a quality drop that agents need

to detect.

When to use this runbook

When retrieval is misbehaving and you suspect (or want to confirm) it's

the embedding side.

Context

`HybridSearchService.rerank()` is the integration point. It calls the

embedding service for each query, fuses dense scores with BM25 via

Reciprocal Rank Fusion (k=60), and returns the merged result list. When

the embedding call throws, the dense list is treated as empty and RRF

reduces to BM25 ranks.

The local default embedding service is Ollama running `nomic-embed-v1.5`

on port 11434. Misbehaviour usually traces to (a) the Ollama process

crashed, (b) the model wasn't loaded, or (c) GPU/memory pressure

delaying inference.

Walkthrough

The frontmatter `steps` are the canonical diagnostic sequence: confirm

Ollama, restart if needed, look at logs, broaden the query while in

fallback, check metrics.

Pitfalls

The frontmatter `pitfalls` capture the failure modes. The "BM25

fallback is a defect" misreading is the most common — it's a feature,

and chasing it as a bug wastes time.