Full-Text Search in PostgreSQL
PostgreSQL has full-text search built in. For many applications, it's enough — no separate Elasticsearch cluster, no syncing infrastructure. The same database that holds the data does the search.
This page covers when it's the right choice and how to use it.
The basics
tsvector and tsquery
PostgreSQL stores searchable text as `tsvector` (tokenized; sorted; deduplicated):
```sql
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
```
Stop words removed; words stemmed; positions tracked.
Searching
```sql
SELECT * FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'fox & dog');
```
The `@@` operator matches a tsvector against a tsquery.
Generated column
For performance, store the tsvector:
```sql
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;
```
Now searches don't recompute the vector each time.
GIN index
For fast lookup:
```sql
CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);
```
GIN indexes for tsvector are the standard.
Ranking
Rank results by relevance:
```sql
SELECT title, body,
ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'fox & dog') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
```
`ts_rank` produces a relevance score. Higher rank = more relevant.
Phrase and proximity
```sql
-- Phrase
to_tsquery('english', 'quick <-> brown') -- "quick" immediately followed by "brown"
-- Proximity
to_tsquery('english', 'quick <2> dog') -- within 2 words
```
For exact phrase matching.
Multiple languages
```sql
to_tsvector('spanish', body)
to_tsvector('french', body)
```
PostgreSQL ships with stemming dictionaries for major languages. For others, additional dictionaries available.
Highlighting
Show matched terms in results:
```sql
SELECT ts_headline('english', body, query, 'StartSel=<b>, StopSel=</b>')
FROM articles, to_tsquery('english', 'fox') query
WHERE search_vector @@ query;
```
Returns body with matched terms wrapped in `<b>` tags.
When PostgreSQL FTS is enough
- **Small to moderate corpus**: thousands to millions of documents
- **Single-language or few languages**: dictionaries available
- **Simple ranking**: tf-idf-like scoring works
- **You already have PostgreSQL**: no separate infrastructure
- **No real-time aggregations needed**: just search
For typical CRUD apps with search, this is enough.
When you need Elasticsearch
- **Very large scale**: tens of millions+ documents with high query rate
- **Sophisticated relevance tuning**: machine-learning ranking, synonyms, custom scoring
- **Faceted search**: aggregations, drill-downs
- **Real-time analytics**: not just search
- **Heavy aggregations**: percentiles, histograms over search results
See [ElasticsearchFundamentals](ElasticsearchFundamentals).
Specific patterns
Multiple fields with weights
```sql
SELECT title, body,
ts_rank(
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B'),
query
) AS rank
FROM articles, to_tsquery('english', 'fox') query;
```
Title matches weighted higher than body.
Trigram for fuzzy matching
PostgreSQL's pg_trgm extension provides trigram-based similarity:
```sql
CREATE EXTENSION pg_trgm;
SELECT * FROM articles
WHERE title % 'foks'; -- "%" is similarity operator
```
Useful for handling typos, partial matches.
Combining FTS and trigram
For autocomplete: trigram for prefix matching; FTS for full-content.
JSON full-text
PostgreSQL's JSONB columns can be searched with FTS too:
```sql
SELECT * FROM events
WHERE to_tsvector('english', data->>'description') @@ to_tsquery('english', 'fox');
```
For semi-structured data.
Performance considerations
Index maintenance
GIN indexes are slower to update than B-tree. For high-write workloads, consider GIN with `fastupdate=on` (writes go to a pending list; consolidated periodically).
Index size
GIN indexes are large. Plan storage accordingly.
Generated column vs. trigger
Generated column (PostgreSQL 12+) recomputes on update; transparent.
Trigger-based updates are flexible but require maintenance.
For modern PostgreSQL, generated columns are simpler.
Reindexing
Bulk loads: drop the index; load; recreate. Faster than incremental.
Common failure patterns
- **Searching without indexes.** Sequential scans of huge tables.
- **Wrong language config.** English stemmer on Spanish content.
- **Computing tsvector per query.** Use stored generated column.
- **Ignoring relevance.** Listing matches without ranking.
- **Choosing Elasticsearch when Postgres would suffice.** Extra infrastructure for small needs.
A reasonable starter
For a typical app needing search:
1. Add `tsvector` column with appropriate language
2. GIN index on the column
3. Use `ts_rank` for ordering
4. Iterate based on user feedback
5. Migrate to Elasticsearch if scale or features require it
Most apps never need to migrate.
Further Reading
- [ElasticsearchFundamentals](ElasticsearchFundamentals) — When Postgres isn't enough
- [CloudDatabases](CloudDatabases) — PostgreSQL options
- [ReadReplicasAndReplication](ReadReplicasAndReplication) — Adjacent scaling