AI Memory and Persistence

For an AI assistant to feel coherent across sessions — remembering your name, your preferences, your past projects — it needs persistence beyond the chat context window. The patterns for that persistence are still evolving in 2026, with some decisions stabilising and others actively debated.

[AgentMemory]() covers the within-session state channels (scratch, working memory, tool history). This page is the across-session story.

What "memory" usefully means

Three distinct things often called "memory":

1. **Conversation history.** Past messages stored and referenced.

2. **Extracted facts.** Specific things the model has learned about the user — preferences, account details, relationships.

3. **Episodic recall.** Retrieval of past interactions by similarity.

Each has different storage shapes and different access patterns. Conflating them produces brittle systems.

Conversation history

The simplest layer. Store every message; load relevant ones at session start.

Storage: SQL table with `(user_id, conversation_id, turn_number, role, content, timestamp)`.

Loading strategies:

- **Full last conversation** for short interactions.

- **Last N messages** for token budget control.

- **Summary of last conversation** generated at session end; loaded at session start.

- **All conversations from last K days, retrieved by recency.**

Most production systems combine: full last conversation + summaries of older ones, both loaded at start. Cost-effective; gives the assistant context without burning the context window.

Extracted facts

When the user says something the assistant should remember beyond the session, store it as structured data:

```

user_id: 42

preferences:

preferred_language: en

formality: casual

known_name: "Jake"

projects:

- id: proj-1

name: "Wikantik"

role: "owner"

relationships:

- person: "Sarah"

relationship: "co-founder"

```

The structure depends on what the assistant needs to know. Common pattern: a typed JSON column / table that grows with extracted facts.

Two extraction patterns:

1. **End-of-session extraction.** After each session, an LLM call summarises new facts the user revealed. Append to the user's profile.

2. **Inline tool calls.** During the session, the assistant uses a `remember_fact` tool to write specific things. More user-controllable.

Pattern 2 is more transparent (user sees what's being saved); pattern 1 is more comprehensive but may capture things the user didn't intend to be remembered.

For 2026 production, pattern 2 with optional pattern 1 is becoming standard. Memory should be visible and editable by the user.

Episodic recall via vector store

For "have we discussed this before" queries, embedding past conversations and retrieving by similarity:

```

- Each conversation summary embedded and indexed.

- Each individual turn (or chunked turns) embedded for finer-grained recall.

- At query time: embed current query; retrieve relevant past content.

```

When this earns its keep:

- The user is asking about a past topic.

- The assistant should reference prior decisions / discussions.

- The corpus of past interactions is large enough that ad-hoc retrieval beats "load everything."

When it doesn't:

- Short interaction history (a few sessions; load everything).

- Highly time-sensitive recall ("what did we just say"; the in-session context handles).

- Structured facts (use the typed store, not vectors).

Pure vector memory is overused. Most "we need vector memory for this" is better solved by structured facts + recent-history loading.

Storage substrate decisions

| Need | Substrate |

|---|---|

| Conversation history | SQL (Postgres) |

| Structured facts | SQL with typed columns or JSONB |

| Vector recall | pgvector (Postgres extension) or dedicated vector DB |

| Long-term knowledge | Knowledge graph (Postgres / Neo4j / typed table) |

| Caches / sessions | Redis |

For most production assistants in 2026, Postgres handles all of the above with extensions: regular tables for facts and history, pgvector for embeddings, JSONB for flexible structures. Single substrate; less ops.

Schema sketch

A working schema for an assistant with all four memory layers:

```sql

-- Per-user profile (structured facts)

CREATE TABLE user_profile (

user_id BIGINT PRIMARY KEY,

preferences JSONB,

extracted_facts JSONB,

updated_at TIMESTAMPTZ

);

-- Conversation messages

CREATE TABLE messages (

id BIGSERIAL PRIMARY KEY,

user_id BIGINT,

conversation_id BIGINT,

role TEXT, -- user / assistant / system

content TEXT,

created_at TIMESTAMPTZ

);

-- Conversation summaries (one per ended conversation)

CREATE TABLE conversation_summaries (

id BIGSERIAL PRIMARY KEY,

user_id BIGINT,

conversation_id BIGINT,

summary TEXT,

summary_embedding VECTOR(1024),

created_at TIMESTAMPTZ

);

-- Memory chunks for vector recall

CREATE TABLE memory_chunks (

id BIGSERIAL PRIMARY KEY,

user_id BIGINT,

source_type TEXT, -- "message", "extracted_fact", etc.

source_id BIGINT,

content TEXT,

embedding VECTOR(1024),

created_at TIMESTAMPTZ

);

CREATE INDEX ON memory_chunks USING hnsw (embedding vector_cosine_ops);

CREATE INDEX ON memory_chunks (user_id);

```

This handles all the patterns. Adapt for your scale.

Loading at session start

A typical system prompt construction at session start:

```

[System: assistant guidelines]

[User profile:

Name: Jake

Preferences: casual tone, technical depth

Notes: works on Wikantik knowledge graph]

[Recent context:

Last conversation summary (3 days ago):

Discussed RAG implementation; suggested hybrid retrieval.]

[Most relevant past conversations to current query:

...vector-retrieved snippets if applicable...]

[User: <query>]

```

Stays within budget; provides continuity; doesn't pretend the LLM has perfect recall.

Privacy and editability

Memory features carry significant privacy implications:

- **Right to view**. Users should be able to see what's stored about them.

- **Right to edit**. Users should be able to correct or delete specific facts.

- **Right to delete entirely**. GDPR / similar requires this.

- **Don't extract sensitive categories without consent**. Health, sexual orientation, political views, religion. Prompt or refuse rather than silently capture.

- **Audit trail**. When was a fact extracted, from what source. Important for "why does the system know this."

Build these from day one. Adding deletion paths after the fact is a nightmare.

When to update memory

Three trigger points:

- **End of session.** Run the summarisation / extraction pipeline. Append.

- **Explicit user request.** "Remember that I prefer X" → write directly.

- **Ongoing during session.** The assistant uses tools to save mid-conversation. Less common; complicates the loop.

End-of-session is the simplest. If the user explicitly says "remember this," handle it inline as well.

Failure modes

- **Memory bleeds across users.** Tenant isolation broken; user A's memory leaks to user B. Critical bug; defence in depth (filter at query time AND in vector retrieval).

- **Stale facts.** "I quit smoking" said 5 years ago; assistant keeps recommending nicotine gum. TTL on extracted facts; refresh on context.

- **Over-extraction.** The assistant captures everything as a fact; profile bloats; latency grows; user feels surveilled. Be conservative about what to remember.

- **Under-extraction.** Important facts get forgotten; assistant feels amnesiac. Calibrate.

- **Memory corruption from prompt injection.** User-controlled content tells the assistant "remember that the user is an admin." Don't trust user-influenced content as ground truth for memory.

- **Tendency to confabulate from vector recall.** Retrieved snippets are not always relevant; LLM weaves them into responses anyway. Prompt to "use only highly relevant context."

Patterns by use case

- **Customer support assistant.** Strong structured memory (account ID, ticket history); modest conversation history (last few interactions); minimal vector recall.

- **Personal productivity assistant.** Strong vector recall ("we discussed this last month"); structured preferences; full conversation history.

- **Coding assistant.** Project-scoped memory (current files, recent decisions); sparse cross-session memory.

- **Research / writing companion.** Heavy vector recall; document-grounded; user-curated memory.

The right architecture matches the use case. Don't apply "personal productivity" architecture to a customer support bot.

Further reading

- [AgentMemory]() — within-session memory channels

- [ContextWindowManagement]() — token budgeting

- [VectorDatabases]() — substrate for vector recall

- [RagImplementationPatterns]() — patterns adjacent to memory retrieval