Handling Embedding Service Outages
Hybrid retrieval is designed fail-closed: when the embedding service
goes away, the system reverts to BM25 with no user-visible error and no
service interruption. The trade-off is a quality drop that agents need
to detect.
When to use this runbook
When retrieval is misbehaving and you suspect (or want to confirm) it's
the embedding side.
Context
`HybridSearchService.rerank()` is the integration point. It calls the
embedding service for each query, fuses dense scores with BM25 via
Reciprocal Rank Fusion (k=60), and returns the merged result list. When
the embedding call throws, the dense list is treated as empty and RRF
reduces to BM25 ranks.
The local default embedding service is Ollama running `nomic-embed-v1.5`
on port 11434. Misbehaviour usually traces to (a) the Ollama process
crashed, (b) the model wasn't loaded, or (c) GPU/memory pressure
delaying inference.
Walkthrough
The frontmatter `steps` are the canonical diagnostic sequence: confirm
Ollama, restart if needed, look at logs, broaden the query while in
fallback, check metrics.
Pitfalls
The frontmatter `pitfalls` capture the failure modes. The "BM25
fallback is a defect" misreading is the most common — it's a feature,
and chasing it as a bug wastes time.