Knowledge Graph vs Relational Database

The "we need a knowledge graph" decision often defaults to "let's deploy Neo4j" without examining whether the data and queries actually warrant a graph database. Most knowledge-graph use cases work fine in Postgres with the right schema. Some genuinely don't.

This page is the decision criteria, the trade-offs, and the hybrid pattern that most mature systems converge on.

What a knowledge graph actually is

A KG stores entities (nodes) and typed relationships (edges) between them, with the relationships being first-class queryable.

```

(Anthropic) -[:founded_in]-> (San Francisco)

(Dario Amodei) -[:ceo_of]-> (Anthropic)

(Anthropic) -[:produces]-> (Claude)

(Anthropic) -[:competitor_of]-> (OpenAI)

```

Queries traverse: "what companies does Dario lead?" "what does Anthropic compete with?" "what are the products of companies founded in SF that compete with OpenAI?"

The shape that benefits: queries that follow many edges, where the joining structure isn't fixed, where you want to ask graph-shaped questions.

When a relational database is enough

You probably don't need a KG when:

- **Your relationships are stable.** Each `User` has exactly one `Account`; each `Order` has many `OrderLines`. Foreign keys handle this.

- **Queries follow at most 2-3 hops.** A relational `JOIN` does this fine.

- **You don't need to traverse "any edge."** Specific known traversals fit a relational schema.

- **You want SQL.** ACID, SQL ecosystem, mature tooling, your team already knows it.

Most CRUD systems, e-commerce, SaaS dashboards — these work in Postgres without a KG. A graph database adds operational complexity without providing usable benefit.

When a KG fits

You probably benefit from a KG when:

- **The relationships are themselves the data.** Citation networks, social graphs, biological pathways, fraud rings — the structure is the point.

- **Queries are deeply traversal-heavy.** "Find all paths between A and B with at most 5 hops." Doable in SQL but ugly.

- **The entity types and relationship types proliferate.** Schema-flexible queries help.

- **You want pattern matching across relationships.** "Find any person who reports to someone who reports to someone" — Cypher / Gremlin handle this naturally.

- **Provenance / context per edge matters.** Edges carry their own metadata, sources, confidence.

These are the hard sells. Identifying them requires understanding both your data and your queries.

The graph-on-relational pattern

Most KGs in 2026 don't use a graph database. They use Postgres with a graph schema:

```sql

CREATE TABLE nodes (

id BIGSERIAL PRIMARY KEY,

type TEXT NOT NULL,

name TEXT NOT NULL,

properties JSONB,

UNIQUE (type, name)

);

CREATE TABLE edges (

id BIGSERIAL PRIMARY KEY,

source_id BIGINT NOT NULL REFERENCES nodes(id),

target_id BIGINT NOT NULL REFERENCES nodes(id),

relation TEXT NOT NULL,

properties JSONB,

confidence REAL,

source TEXT

);

CREATE INDEX ON edges (source_id, relation);

CREATE INDEX ON edges (target_id, relation);

```

For 1-3 hop queries, this is fast. Recursive CTEs handle deeper traversals. The whole stack is Postgres; you have transactions, joins with non-graph tables, and the operational simplicity of one database.

This is what most production "knowledge graphs" actually are. Calling it a KG and serving it from Postgres is fine.

When you need an actual graph database

For specific cases:

- **Heavy graph algorithms.** Centrality, shortest path on million-node graphs, community detection. Neo4j Graph Data Science library wins on these.

- **Complex pattern matching.** Cypher's `MATCH (a)-[*..5]-(b)` is genuinely simpler than the SQL recursive CTE equivalent.

- **Very high traversal depth.** 10+ hop traversals. Graph databases are tuned for this.

- **Schema fluidity at extreme scale.** Adding new entity types and relationships dynamically without migrations.

For these, Neo4j, JanusGraph, TigerGraph, or AgensGraph make sense.

For most other cases, Postgres + graph schema wins on operational simplicity.

Triple stores (RDF)

A different KG flavour: triples (subject-predicate-object) with formal semantics (RDF, OWL, SPARQL). Stronger reasoning capabilities; useful for ontology-heavy domains (life sciences, library science, semantic web).

Less common in industry; most "knowledge graph" projects in 2026 use property graphs (Neo4j-style) or relational implementations.

Hybrid: Postgres with graph + relational + vector

The pattern most mature production knowledge bases land on:

```

Postgres with extensions:

- Relational tables for structured data (users, accounts, orders).

- nodes / edges tables for the graph layer.

- pgvector for embeddings (semantic search).

- JSONB for flexible properties.

```

Single substrate; transactional consistency; one ops story.

Queries cross layers:

```sql

-- Find users related to "AI" by topic, with their recent orders

SELECT u.name, COUNT(o.id) AS orders

FROM users u

JOIN edges e ON e.source_id = u.kg_node_id

JOIN nodes n ON n.id = e.target_id

LEFT JOIN orders o ON o.user_id = u.id

WHERE n.name = 'AI' AND e.relation = 'interested_in'

AND o.created_at > NOW() - INTERVAL '30 days'

GROUP BY u.id, u.name;

```

This is the wiki you're looking at. The Wikantik knowledge graph is built on Postgres + pgvector + a graph schema. It works.

What KGs add to RAG

For retrieval-augmented generation:

- **Entity-aware retrieval.** "Tell me about Anthropic" returns paragraphs *about* Anthropic, not paragraphs that mention Anthropic.

- **Multi-hop retrieval.** "Companies competing with companies in SF" — KG traversal can construct the candidate set.

- **Constraint enforcement.** "Documents about Anthropic *and* RAG, after 2023" — structured metadata + relation lookup.

Pure vector RAG doesn't do these well. KG-augmented RAG ("GraphRAG") fills the gap. Microsoft's GraphRAG project popularised the approach; many production systems now combine KG and vector retrieval.

See [KnowledgeGraphCompletion]() for the construction side; [RagImplementationPatterns]() for retrieval.

Practical decision criteria

For a new project considering a KG:

1. **Sketch the queries.** What questions will you ask?

2. **For each query, write the SQL** (assuming relational + JSONB). Is it ugly?

3. **For each query, write the Cypher** (assuming graph DB). Is it materially better?

4. **If queries are simple in SQL**, use Postgres. Add graph schema if you need some graph-shaped questions.

5. **If queries are genuinely graph-shaped and deep**, evaluate Neo4j vs Postgres-graph-on-Postgres on your data scale.

Most projects stop at step 4. The minority needing step 5's graph DB are the genuinely graph-heavy use cases.

Failure modes

- **Premature graph adoption.** Adopt Neo4j; spend months ramping up Cypher; realize relational was fine.

- **Schemaless graph chaos.** "We don't need a schema!" → 50 different "Person" node types with overlapping properties. Validate at write time even in graph stores.

- **Forgetting transactions.** Some graph databases have weaker transactional guarantees than Postgres. Verify; design accordingly.

- **Vector store bolted on awkwardly.** Storing graph in Neo4j and embeddings in Pinecone produces sync nightmares. Co-locate when you can.

- **Query language proliferation.** App speaks SPARQL, Cypher, Gremlin, SQL, ElasticSearch DSL. Pick few; resist the urge to add another.

A pragmatic recommendation

For most teams in 2026:

- **Start with Postgres + pgvector**. Add a graph-schema (nodes + edges tables) only when you have queries that benefit.

- **Use a graph DB** only when you've hit specific limits with the relational approach (deep traversals, graph algorithms at scale, complex pattern matching).

- **Don't conflate "we have related data" with "we need a graph database."** Foreign keys are also graphs.

This is conservative advice; deviate when you have a specific reason.

Further reading

- [KnowledgeGraphCompletion]() — building the graph

- [GraphDatabaseFundamentals]() — graph DB specifics

- [DatabaseDesign]() — relational schema design

- [RagImplementationPatterns]() — KGs in retrieval pipelines