Graph-Aware Reranking and Semantic Proximity
The final stage of the Wikantik search pipeline (Phase 3) is a **Graph-Aware Reranker** (`GraphRerankStep`). While Phase 1 and 2 focus on textual and vector similarity, Phase 3 utilizes the **topology** of the Knowledge Graph to surface related content that might lack direct keyword or vector overlap.
1. The Proximity Heuristic
The reranker operates on a simple but powerful intuition: **"If a page mentions entities that are closely connected in the Knowledge Graph to the user's intent, that page is likely highly relevant."**
A. Graph Traversal (Multi-Source BFS)
The process begins by resolving query terms into a set of seed entity IDs ($Q$). The system then performs a multi-source Breadth-First Search (BFS) through the `kg_edges` adjacency map up to a maximum radius ($H_{max}$, typically 2).
* **Distance Calculation**: Every reachable entity $e$ is assigned a distance $d(Q, e)$ equal to the shortest undirected hop count from any seed entity.
B. Scoring Function
The proximity score $S_{prox}$ for a candidate page is determined by the **maximum proximity** of its mentioned entities:
$$ S_{prox}(p) = \max_{m \in \text{Mentions}(p)} \left( \frac{1}{1 + d(Q, m)} \right) $$
* **Why Max?**: Using the maximum (rather than the mean or sum) ensures that a single high-quality match is enough to boost a page, preventing relevant content from being diluted by "noisy" co-mentions of unrelated entities.
2. Implementation: The GraphRerankStep
The implementation is designed for high-performance and **Graceful Degradation**.
1. **Candidate Anchoring**: The input to the step is the fused list from Phase 2 (RRF). No pages are added or removed; the candidate set is fixed.
2. **Bulk Loading**: The `PageMentionsLoader` fetches all entity mentions for the entire candidate set (e.g., top 100 pages) in a **single SQL round-trip** using the `ANY(?)` operator.
3. **Base Rank Scaling**: To ensure the boost is proportional to the initial relevance, each page is assigned a base score derived from its fused rank: $B(p) = 1.0 - (\text{rank} / N)$.
4. **The Boost Calculation**:
$$ \text{FinalScore}(p) = B(p) + (\text{boost\_weight} \times S_{prox}(p)) $$
5. **Stable Reordering**: The list is re-sorted by the final score. Because the sort is stable, pages with equal proximity scores retain their relative RRF ordering.
3. Fail-Safe Mechanics
The reranker is a "non-critical" enhancement. The `GraphRerankStep` is wrapped in a fail-closed logic:
* **Disabled**: If the graph feature is off, returns input list verbatim.
* **Index Not Ready**: If the `kg_edges` table is being rebuilt or exceeds memory caps, returns input list.
* **Zero Matches**: If no query entities are resolved or no candidates mention the graph neighborhood, the proximity score is $0.0$ for all, and the RRF order is preserved bit-identically.
---
**See Also:**
- [Wikantik Search and Retrieval](WikantikSearchAndRetrieval) — The 4-phase overview.
- [Spectral Graph Theory](SpectralGraphTheoryConceptual) — The theory of graph shape.
- [Knowledge Graph Extraction](WikantikKnowledgeGraph) — Generating the co-mention edges.