Knowledge Graph vs Relational Database
The "we need a knowledge graph" decision often defaults to "let's deploy Neo4j" without examining whether the data and queries actually warrant a graph database. Most knowledge-graph use cases work fine in Postgres with the right schema. Some genuinely don't.
This page is the decision criteria, the trade-offs, and the hybrid pattern that most mature systems converge on.
What a knowledge graph actually is
A KG stores entities (nodes) and typed relationships (edges) between them, with the relationships being first-class queryable.
```
(Anthropic) -[:founded_in]-> (San Francisco)
(Dario Amodei) -[:ceo_of]-> (Anthropic)
(Anthropic) -[:produces]-> (Claude)
(Anthropic) -[:competitor_of]-> (OpenAI)
```
Queries traverse: "what companies does Dario lead?" "what does Anthropic compete with?" "what are the products of companies founded in SF that compete with OpenAI?"
The shape that benefits: queries that follow many edges, where the joining structure isn't fixed, where you want to ask graph-shaped questions.
When a relational database is enough
You probably don't need a KG when:
- **Your relationships are stable.** Each `User` has exactly one `Account`; each `Order` has many `OrderLines`. Foreign keys handle this.
- **Queries follow at most 2-3 hops.** A relational `JOIN` does this fine.
- **You don't need to traverse "any edge."** Specific known traversals fit a relational schema.
- **You want SQL.** ACID, SQL ecosystem, mature tooling, your team already knows it.
Most CRUD systems, e-commerce, SaaS dashboards — these work in Postgres without a KG. A graph database adds operational complexity without providing usable benefit.
When a KG fits
You probably benefit from a KG when:
- **The relationships are themselves the data.** Citation networks, social graphs, biological pathways, fraud rings — the structure is the point.
- **Queries are deeply traversal-heavy.** "Find all paths between A and B with at most 5 hops." Doable in SQL but ugly.
- **The entity types and relationship types proliferate.** Schema-flexible queries help.
- **You want pattern matching across relationships.** "Find any person who reports to someone who reports to someone" — Cypher / Gremlin handle this naturally.
- **Provenance / context per edge matters.** Edges carry their own metadata, sources, confidence.
These are the hard sells. Identifying them requires understanding both your data and your queries.
The graph-on-relational pattern
Most KGs in 2026 don't use a graph database. They use Postgres with a graph schema:
```sql
CREATE TABLE nodes (
id BIGSERIAL PRIMARY KEY,
type TEXT NOT NULL,
name TEXT NOT NULL,
properties JSONB,
UNIQUE (type, name)
);
CREATE TABLE edges (
id BIGSERIAL PRIMARY KEY,
source_id BIGINT NOT NULL REFERENCES nodes(id),
target_id BIGINT NOT NULL REFERENCES nodes(id),
relation TEXT NOT NULL,
properties JSONB,
confidence REAL,
source TEXT
);
CREATE INDEX ON edges (source_id, relation);
CREATE INDEX ON edges (target_id, relation);
```
For 1-3 hop queries, this is fast. Recursive CTEs handle deeper traversals. The whole stack is Postgres; you have transactions, joins with non-graph tables, and the operational simplicity of one database.
This is what most production "knowledge graphs" actually are. Calling it a KG and serving it from Postgres is fine.
When you need an actual graph database
For specific cases:
- **Heavy graph algorithms.** Centrality, shortest path on million-node graphs, community detection. Neo4j Graph Data Science library wins on these.
- **Complex pattern matching.** Cypher's `MATCH (a)-[*..5]-(b)` is genuinely simpler than the SQL recursive CTE equivalent.
- **Very high traversal depth.** 10+ hop traversals. Graph databases are tuned for this.
- **Schema fluidity at extreme scale.** Adding new entity types and relationships dynamically without migrations.
For these, Neo4j, JanusGraph, TigerGraph, or AgensGraph make sense.
For most other cases, Postgres + graph schema wins on operational simplicity.
Triple stores (RDF)
A different KG flavour: triples (subject-predicate-object) with formal semantics (RDF, OWL, SPARQL). Stronger reasoning capabilities; useful for ontology-heavy domains (life sciences, library science, semantic web).
Less common in industry; most "knowledge graph" projects in 2026 use property graphs (Neo4j-style) or relational implementations.
Hybrid: Postgres with graph + relational + vector
The pattern most mature production knowledge bases land on:
```
Postgres with extensions:
- Relational tables for structured data (users, accounts, orders).
- nodes / edges tables for the graph layer.
- pgvector for embeddings (semantic search).
- JSONB for flexible properties.
```
Single substrate; transactional consistency; one ops story.
Queries cross layers:
```sql
-- Find users related to "AI" by topic, with their recent orders
SELECT u.name, COUNT(o.id) AS orders
FROM users u
JOIN edges e ON e.source_id = u.kg_node_id
JOIN nodes n ON n.id = e.target_id
LEFT JOIN orders o ON o.user_id = u.id
WHERE n.name = 'AI' AND e.relation = 'interested_in'
AND o.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.name;
```
This is the wiki you're looking at. The Wikantik knowledge graph is built on Postgres + pgvector + a graph schema. It works.
What KGs add to RAG
For retrieval-augmented generation:
- **Entity-aware retrieval.** "Tell me about Anthropic" returns paragraphs *about* Anthropic, not paragraphs that mention Anthropic.
- **Multi-hop retrieval.** "Companies competing with companies in SF" — KG traversal can construct the candidate set.
- **Constraint enforcement.** "Documents about Anthropic *and* RAG, after 2023" — structured metadata + relation lookup.
Pure vector RAG doesn't do these well. KG-augmented RAG ("GraphRAG") fills the gap. Microsoft's GraphRAG project popularised the approach; many production systems now combine KG and vector retrieval.
See [KnowledgeGraphCompletion]() for the construction side; [RagImplementationPatterns]() for retrieval.
Practical decision criteria
For a new project considering a KG:
1. **Sketch the queries.** What questions will you ask?
2. **For each query, write the SQL** (assuming relational + JSONB). Is it ugly?
3. **For each query, write the Cypher** (assuming graph DB). Is it materially better?
4. **If queries are simple in SQL**, use Postgres. Add graph schema if you need some graph-shaped questions.
5. **If queries are genuinely graph-shaped and deep**, evaluate Neo4j vs Postgres-graph-on-Postgres on your data scale.
Most projects stop at step 4. The minority needing step 5's graph DB are the genuinely graph-heavy use cases.
Failure modes
- **Premature graph adoption.** Adopt Neo4j; spend months ramping up Cypher; realize relational was fine.
- **Schemaless graph chaos.** "We don't need a schema!" → 50 different "Person" node types with overlapping properties. Validate at write time even in graph stores.
- **Forgetting transactions.** Some graph databases have weaker transactional guarantees than Postgres. Verify; design accordingly.
- **Vector store bolted on awkwardly.** Storing graph in Neo4j and embeddings in Pinecone produces sync nightmares. Co-locate when you can.
- **Query language proliferation.** App speaks SPARQL, Cypher, Gremlin, SQL, ElasticSearch DSL. Pick few; resist the urge to add another.
A pragmatic recommendation
For most teams in 2026:
- **Start with Postgres + pgvector**. Add a graph-schema (nodes + edges tables) only when you have queries that benefit.
- **Use a graph DB** only when you've hit specific limits with the relational approach (deep traversals, graph algorithms at scale, complex pattern matching).
- **Don't conflate "we have related data" with "we need a graph database."** Foreign keys are also graphs.
This is conservative advice; deviate when you have a specific reason.
Further reading
- [KnowledgeGraphCompletion]() — building the graph
- [GraphDatabaseFundamentals]() — graph DB specifics
- [DatabaseDesign]() — relational schema design
- [RagImplementationPatterns]() — KGs in retrieval pipelines