KG Inclusion Policy
The knowledge graph (KG) is built from a subset of wiki pages. This page is
the operator's guide to that subset: the model, the dashboard, the CLI, and
the workflows you'll use day-to-day.
The decision model
For any page, the system evaluates four steps in order and stops at the first
one that applies:
1. **System page?** Sandbox, Main, navigation pages, etc. Always excluded.
2. **`kg_include: false` in frontmatter?** Excluded, regardless of cluster.
3. **`kg_include: true` in frontmatter?** Included, regardless of cluster.
4. **Cluster policy.** If the page's cluster has an `include` row in
`kg_cluster_policy`, the page is included. Otherwise excluded.
The default is **exclude** — a cluster you haven't touched contributes nothing
to the KG. This is deliberate: imports of new content can't sneak into agent
retrieval before you've reviewed them.
The page's cluster is read from frontmatter (`cluster: <name>`); see
[StructuralSpineDesign](StructuralSpineDesign).
The dashboard
Visit `/admin/kg-policy`. The home view is a sortable table:
| Column | Meaning |
|--------|---------|
| Cluster | Cluster name as it appears in frontmatter |
| Pages | Total page count in this cluster |
| Action | `include` (green), `exclude` (gray), or `unset` (yellow) |
| Reason | Free-text reason captured when you set the policy |
| Set by | Principal who last changed the row |
| Last reviewed | Relative timestamp; >90 days renders in red |
| Actions | Edit / Clear buttons |
Bootstrap (one-time)
The first time you visit the dashboard with no policy rows, it surfaces a
"Bootstrap" call-to-action. Visit `/admin/kg-policy/bootstrap` to run the
wizard: 27 clusters are pre-checked for `include`, 15 for `exclude`, based
on a tech / finance / lifestyle decomposition. Scan, uncheck what shouldn't
be there, fill a single shared reason ("bootstrap initial config"), confirm.
One transaction inserts all rows, and the eager reconciliation kicks off.
Page lookup ("Why is this page in/out?")
`/admin/kg-policy/explain` takes a title or canonical_id and prints the
four-step trace showing exactly why a page is included or excluded. Use
this when:
- A page is showing up in retrieval and shouldn't be
- You're confirming that a frontmatter override took effect
- You want to know which cluster a page is associated with
Pending-review queue
`/admin/kg-policy/pending` surfaces:
- **Unset clusters** — clusters in the corpus with no policy decision.
Default-exclude is in effect; you should make a deliberate choice.
- **Stale reviews** — clusters whose `reviewed_at` is older than 90 days
(or null). Possible drift since you last looked.
- **Recent page-count changes** — placeholder; the threshold logic will
populate this once we capture cluster-size history.
Empty most days; non-empty triggers a 5-minute review session.
The CLI
Everything in the dashboard is also available via `bin/kg-policy.sh`.
```bash
bin/kg-policy.sh list # current policy state
bin/kg-policy.sh set java include --reason "core tech, agent retrieval"
bin/kg-policy.sh clear java # back to unset
bin/kg-policy.sh explain java # cluster's current policy + audit
bin/kg-policy.sh review # pending-review items
bin/kg-policy.sh mark-reviewed databases # bump reviewed_at
bin/kg-policy.sh diff personal-finance # excluded-pages snapshot
bin/kg-policy.sh reconcile # show excluded counts by reason
bin/kg-policy.sh audit --cluster java --limit 50
```
`reconcile` from the CLI is informational. Full reconciliation runs
automatically when you change policy via REST or the dashboard, or you can
restart Tomcat — both routes invoke `ReconciliationJobRunner`.
`purge` is destructive — it hard-deletes `kg_nodes`, `kg_edges`, and
`chunk_entity_mentions` rows for excluded pages. Use only when you want
storage back and won't be re-including the cluster soon.
```bash
bin/kg-policy.sh purge personal-finance # dry-run, prints counts
bin/kg-policy.sh purge personal-finance --confirm # actually delete
bin/kg-policy.sh purge --reason system_page --confirm # all system-page exclusions
```
JDBC config is auto-discovered from `tomcat/tomcat-11/conf/Catalina/localhost/ROOT.xml`,
or override per-invocation with `--jdbc-url` / `--jdbc-user` / `--jdbc-password`.
Common workflows
"I just imported 50 new pages"
The structural index sees a new cluster (or a sudden 50% jump in an existing
one). The dashboard shows it in the pending-review queue. Decide include or
exclude with a one-line reason. Eager reconciliation runs immediately.
"Why is this content showing up in retrieval?"
Run `kg-policy explain <page-name>` (or use the Explain tab). One of two
things shows up: the page's cluster is included (the design intent), or the
page has `kg_include: true` in frontmatter. Adjust whichever is wrong.
"I want to test what happens if I exclude `warehouse-automation`"
Toggle the cluster in the dashboard or run `kg-policy set warehouse-automation
exclude`. Eager reconciliation soft-excludes the pages — entities and edges
remain in the KG tables, just hidden from queries. Re-include with `set
warehouse-automation include` and the rows reappear, no LLM cost. If the
experiment shows you don't want them, run `kg-policy purge
warehouse-automation --confirm` to reclaim storage.
"I'm shipping a new content type that shouldn't be in the KG yet"
Add `kg_include: false` to each page's frontmatter. The structural-spine
filter validates the value at save time. When the content is ready, remove
the override.
Operations
- **Database:** `kg_cluster_policy`, `kg_policy_audit`, `kg_excluded_pages`
- **Master switch:** `wikantik.kg_policy.enabled` in `wikantik.properties`
(default `true`). Setting `false` reverts to legacy behaviour (no policy
filtering).
- **Audit log:** every change is recorded in `kg_policy_audit` with `actor`
and timestamp. Append-only.
- **Permissions:** `/admin/kg-policy` and `bin/kg-policy.sh` both require
admin role.
- **Reason precedence in `kg_excluded_pages`:** `system_page` >
`page_override` > `cluster_policy`. The strongest applicable reason is
recorded.
Agent curation path
Curator agents should drive proposal triage through `/wikantik-admin-mcp` rather
than the REST surface:
- `list_proposals` — filtered listing with conflict flags
(`node_exists`, `edge_previously_rejected`)
- `inspect_proposals` — bulk deep-dive (1..50 ids) with prior reviews
- `review_proposals` — bulk `approve | reject | judge` (1..50 ids; `reject`
requires a top-level `reason`)
- `curate_edges` / `curate_nodes` — heterogeneous bulk ops (1..50 ops)
See `docs/superpowers/specs/2026-05-13-kg-curation-mcp-design.md` for the full
envelope and error contract.
Admin-bypass on read paths
Admin-context reads bypass the inclusion filter so curators see entities
they just created, even when the source page hasn't been admitted by the
cluster policy yet. The bypass applies to:
- REST `/admin/knowledge-graph/*` reads (already gated by `AdminAuthFilter`).
- The MCP tools registered on `/wikantik-admin-mcp` — `list_proposals`,
`inspect_proposals`, and the new admin-bypass copies of `query_nodes`
and `search_knowledge` (24 tools total).
The agent-facing `/knowledge-mcp` server keeps the filter on, so retrieval
quality is unchanged. See
`docs/superpowers/specs/2026-05-14-kg-curation-operability-design.md`
for the full contract.
Further Reading
- [WikantikKnowledgeGraphAdmin](WikantikKnowledgeGraphAdmin) — the broader
KG administration guide
- [StructuralSpineDesign](StructuralSpineDesign) — how clusters are tracked
- [Wikantik Development Hub](WikantikDevelopmentHub) — cluster index