Wikantik Development News

A narrative of recent development on Wikantik — the six weeks from mid-April to

the end of May 2026. In that span the project crossed from `1.1.x` to its `2.0`

line (six tagged releases, `2.0.0` through `2.0.5`), went live on real

infrastructure, and shifted its center of gravity from "a wiki that an AI can

read" to "a wiki built, operated, and consumed with AI agents as first-class

citizens."

---

The through-line: an agent-grade knowledge base

The dominant story of these weeks is making the corpus genuinely useful to

retrieval-augmented agents — not as an afterthought, but as a designed surface.

**Hybrid retrieval (mid-to-late April).** The window opens with hybrid search

landing in earnest: BM25 lexical scoring fused with dense vector similarity via

weighted reciprocal-rank fusion. A `QueryEmbedder` wraps the embedding client

with a cache, a timeout, and a hand-rolled circuit breaker so a slow or failing

embedding backend degrades to lexical search instead of taking the page down. An

in-memory chunk-vector index served the first dense top-k, and a retrieval

**experiment harness** with reproducible eval reports let us pick fusion weights

from evidence rather than intuition. This shipped as `v1.1.6`.

**Agent-grade content and the structural spine (late April).** From there the

work turned structural. A `ContextRetrievalService` was extracted as the single

seam through which search, the REST API, and the tool servers all retrieve —

collapsing several near-duplicate read paths into one. On top of it came the

**structural spine**: rename-stable canonical IDs, cluster/hub membership, and a

machine-queryable `/api/structure/*` index with matching MCP tools, all enforced

at save time. Pages gained **verification metadata** — `verified_at`,

`verified_by`, a computed `confidence` (authoritative / provisional / stale), and

an `audience` flag — backed by a trusted-authors registry, a `mark_page_verified`

tool, and an operator triage view. The **`/for-agent` projection** was born here

too: a token-budgeted view of any page (summary, key facts, headings outline,

recent changes, tool hints, verification state) that an agent can consume without

pulling the full markdown body. To keep all of this honest, a **retrieval-quality

CI** runner began scoring a curated query set with nDCG, Recall, and MRR across

retrieval modes and persisting the aggregates.

**Knowledge-graph inclusion policy (late April).** As the LLM-extracted knowledge

graph grew, it needed a governance story. A cluster-primary inclusion/exclusion

policy arrived with an admin dashboard, a CLI, per-page frontmatter overrides, and

an audit trail — defaulting to exclude, so entities are opted in deliberately.

**Page Graph vs. Knowledge Graph (early May).** A naming and conceptual cleanup

that paid for itself many times over: formally separating the **Page Graph**

(edges are real page-to-page wikilinks) from the **Knowledge Graph** (nodes are

extracted entities, edges are co-mention or typed relations). Packages, routes,

admin navigation, and tool descriptions were all disambiguated; the typed

`relations:` frontmatter experiment was retired; and identity normalization

(`NodeSignature` / `EdgeSignature`, NFC + predicate-synonym folding) tightened

entity dedup. A staged machine/human validation tier and a `/knowledge-graph`

viewer (mirroring the existing `/page-graph` reader) rounded it out.

**Derived agent hints (mid-May).** The projection got smarter without adding

author burden: derived `prefer_tools` and `prefer_pages` hints ranked across a

page and its cluster hub, a hub-summary synthesizer that overlays a Top-3

highlight when an authored hub summary is generic, a `read_pages` batch tool to

amortize the per-page read tax, and an agent-grade audit report for weak signals.

**Knowledge-graph curation (mid-May).** Curators got real tools: edge-curation v0

in the admin UI with an append-only audit table and one-click human-curated

elevation, then a full bulk curation surface on the MCP server (inspect / review /

curate nodes and edges), admin-bypass read paths so curators see freshly created

entities, and a closed `node_type` / `relationship_type` vocabulary enforced at

write time.

---

Under the hood: decomposition and a hard look at scaling

**Subsystem decomposition (early May).** A multi-phase architectural campaign

broke `wikantik-main` into typed subsystem factories, deleted the old

service-locator manager registry on `WikiEngine` in favor of typed fields,

decomposed the 800-line `WikiContext` into scoped sub-objects, and finished with

two static-analysis sweeps (SpotBugs + PMD) that split the worst god classes and

cleared real bugs. The result is a codebase where subsystem wiring is explicit

and testable in isolation.

**The scaling characterization study (third week of May).** Rather than guess at

performance, we ran a deliberate JFR-instrumented load study and let it drive the

fixes. The headline finding: the ceiling was not CPU but a per-request

DB-connection tax and a chain of shared-lock hotspots. The response was a sweep of

targeted surgeries — short-TTL caches for API-key verification, user lookup, and

KG mentions (removing per-request connections); hoisting shared JDK objects

(`Collator`, `DateTimeFormatter`, `SecureRandom`) off the hot path; a read/write

lock on the versioning file provider to kill per-search contention; an

event-manager dispatch-outside-the-lock fix; and a fixed-permit **backpressure

semaphore** that sheds load with a clean 503 before the thread pool saturates.

**Dense retrieval, productionized.** The brute-force in-memory vector scan — about

60% of search CPU — was replaced with selectable ANN backends: an in-process

**Lucene HNSW** index (now the production default, reading metadata via DocValues

rather than stored fields) and a server-side **pgvector HNSW** option for split-DB

topologies, each gated behind a retrieval-quality parity test so the speed-up

couldn't silently degrade relevance. These landed across `2.0.1` and `2.0.2`.

---

From code to a running product

**Operations and packaging (early-to-mid May).** The project grew the trappings of

something meant to be run by others: a community documentation set (CONTRIBUTING,

SECURITY, code of conduct, a definitive operations handbook), a `bin/container.sh`

wrapper over Docker Compose, a release-on-tag workflow, and — the big one — a

`bin/remote.sh` single entry point for deploying and administering Wikantik on a

remote host over SSH (image transfer, pages rsync, health-polled deploys with

auto-rollback). `2.0.0` was cut and the stack went live in Docker on its first

real host.

**Observability, handed off.** An in-repo Prometheus + Grafana overlay was built

and then deliberately **retired** in favor of an external monitoring stack

(jakemon) that scrapes the container's `/metrics`; a k6 load-test harness with a

`--verify` gate that confirms dashboard panels actually moved was added alongside.

**Off-box backup and disaster recovery (third week of May).** A 3-2-1 backup

posture arrived as a pull model: a NAS reaches in and pulls read-only snapshots on

a daily timer, with restore drills, per-tier Prometheus metrics, and a

`dr-restore.sh` that stands a fresh host up from a verified snapshot with a single

command.

**SSO and account lifecycle (late May).** A dedicated SSO hardening test suite

closed SAML and edge-case gaps — multi-valued claim normalization, fail-closed

identity-collision handling, session rotation against fixation, configurable

identity claims — and **Google OIDC went live** in production, with SAML support

and self-service account deletion alongside published privacy and terms pages.

This was `2.0.3`.

**Go-to-market (late May).** A static marketing site for `www.wikantik.com` was

built and shipped — hero, problem framing, comparison table, hosting options, an

"ask your wiki" terminal moment, and a lead-capture form wired to a serverless

backend — together with SEO fixes (correct sitemap domain, per-page titles,

JSON-LD). Releases `2.0.4` and `2.0.5` carried these out.

---

Reading, writing, and owning pages

The last stretch of the window returned to the everyday reader and author

experience.

**Anchored comments (last week of May).** A Google-Docs-style commenting system —

threads anchored to selected text via robust text-quote selectors, a reader-side

drawer with highlights, re-anchoring that survives edits, reply/resolve/reopen,

and admin-only thread deletion — built on new `comment_threads` / `comments`

tables and a fresh REST surface, capped by a comment UX overhaul.

**Personalization and blogging (29 May).** A personalized left-navigation "me

zone" with a recently-viewed list that updates live across instances, plus a round

of blog read-path and delete-UI fixes.

**Page ownership (30 May).** A page-ownership admin surface — orphaned-page triage

and reassign-by-user — gained a consistent data model when AI-agent-authored pages

(whose frontmatter author is an agent name, not a login) were given a default

owner: a dedicated `agents` service account, configurable and backfilled across

the corpus, so ownership is never silently null. The ownership table also began

showing each page's current name beside its canonical ID.

---

*This page is a narrative summary; the authoritative record of every change is the

git history and the per-release `CHANGELOG`.*