Wiki Search Optimization

Search is the primary way users find content in larger wikis. Browsing works for small wikis; for thousands of pages, search dominates.

Bad search means users can't find what's there. The wiki has the answer; users don't.

This page covers the patterns for good wiki search.

What makes search work

Relevance ranking

Pages most relevant to the query come first. Not chronological; not alphabetical.

Standard scoring: BM25 or similar. Modifications:

- Exact title match boosts

- Tag match boosts

- Recency factor (recent pages slight boost)

- Popularity factor (high-traffic pages slight boost)

Stemming and stopwords

"running" matches "run." "the" doesn't match anything.

Most search libraries handle this by default per language.

Query understanding

User searches "how to deploy"; finds pages titled "Deployment Guide" even without the words.

Synonym expansion; concept matching; semantic search (with embeddings).

Multi-field matching

Title, body, tags, headings, comments. Different weights:

- Title match: high weight

- Heading match: medium

- Body match: lower

- Comment match: lowest (or skipped)

Faceted filtering

Search "deployment" → narrow by section, tags, date, author.

Typo tolerance

"deplyoment" still finds "deployment." Edit-distance matching.

Implementation options

Database full-text search

For small wikis, the database's built-in search. PostgreSQL's `tsvector`; MySQL's full-text.

Simple; same infrastructure. See [FullTextSearchInPostgresql](FullTextSearchInPostgresql).

For larger wikis, performance ceiling.

Dedicated search engine

Elasticsearch, Solr, OpenSearch. Indexed; fast; feature-rich.

For wikis at scale, this is the right answer. See [ElasticsearchFundamentals](ElasticsearchFundamentals).

Cloud-managed

AWS OpenSearch, Algolia, Azure Cognitive Search. Less ops; managed scaling.

Vector / semantic search

Embeddings-based. "Conceptually similar" rather than "word-match."

For wikis with diverse vocabulary, semantic search finds pages keyword search misses.

Combined with keyword search (hybrid retrieval), often the best results.

Wikantik approach

Per CLAUDE.md, Wikantik uses BM25 + dense embeddings + graph-aware rerank for hybrid retrieval. The graph-rerank uses the structural spine and knowledge graph for boosting.

This pattern (hybrid retrieval) has emerged as the modern best-of-breed for wikis with diverse content.

Indexing

Initial index

Crawl all pages; tokenize; index. Can be slow for large wikis.

Incremental indexing

Page edited → incrementally update index. Don't rebuild from scratch.

Re-indexing

Periodically, full reindex. Catches inconsistencies; applies any indexing changes.

Async vs. sync

Sync: edit blocks until indexed.

Async: edit returns; indexing happens in background.

Async is usually better. Brief lag between save and searchability is acceptable.

Page authoring for searchability

Titles matter

Titles are searched first. Use natural-language titles users might search for.

Bad: "Wiki Page 47"

Good: "How to deploy to production"

Headings as anchors

H2/H3 headings are searched. Helpful for finding specific sections within pages.

Frontmatter tags

Explicit tags help filtering and topic-based discovery.

First paragraph

Many search engines weight first paragraph higher. Lead with the page's purpose.

Synonyms in body

Cover terms users might search for, even alternatives. "User" and "customer" might both apply; use both.

Meta descriptions

For search-result snippets. Brief, accurate.

Operational practices

Monitor failed searches

Searches that return no results: gaps. Either content missing or content not findable.

Top searches

Most-searched queries: invest in those areas. Improve titles; expand content.

Click-through rate

For each search query, what percentage of users click a result? Low CTR = bad search results.

Search vs. navigate split

How often users search vs. click links. Reveals search adoption.

For most modern wikis, search dominates after a certain size.

Specific patterns

Did-you-mean

For typos: "Did you mean: deployment?"

Related searches

After viewing results: "Other people searched for X."

Autocomplete

As user types, suggest queries. Reduces query effort.

Recently viewed

Personalization: surface recently-viewed pages.

Popular pages

Highlight high-traffic pages.

Common failure patterns

Database full-text search at scale

Slow. Eventually unusable.

No relevance tuning

Results in wrong order. Right page on page 10 of results.

No synonym handling

User searches term A; page uses term B. No match.

Title-only search

Body content not searched. Missing matches.

No stemming

"deploy" doesn't match "deployment."

No analytics

Don't know what searches fail.

Misindexed (stale)

Recent pages not searchable; old content searched as if current.

A reasonable starter

For wikis up to a few thousand pages:

1. Database full-text search initially

2. Migrate to Elasticsearch when search is slow

3. Hybrid (keyword + semantic) for diverse content

4. Track failed searches; fill content gaps

5. Tune relevance based on click-through rates

6. Auto-complete and synonym handling

For larger or more complex wikis: dedicated search infrastructure from the start.

Further Reading

- [WikiAnalyticsAndEngagement](WikiAnalyticsAndEngagement) — Search analytics

- [WikiPageTemplates](WikiPageTemplates) — Templates affect findability

- [ElasticsearchFundamentals](ElasticsearchFundamentals) — Dedicated search

- [FullTextSearchInPostgresql](FullTextSearchInPostgresql) — DB search