Agent Memory: State Management and Storage Substrates

Memory in an LLM agent is effectively a state management problem across four distinct channels: reasoning, tool history, working facts, and long-term knowledge. Each channel requires a specific storage substrate and eviction policy.

The Four Memory Channels

| Channel | Lifetime | Storage Substrate | Content |

|---|---|---|---|

| **Scratch Reasoning** | Single turn | Model Context | Internal "Chain of Thought" or reasoning steps. |

| **Tool History** | Current loop | Model Context (Summarized) | Sequence of tool calls, arguments, and results. |

| **Working Memory** | Current task | System Prompt / Structured Data | Extracted facts (e.g., `user_id`, `plan_status`) that must survive summarization. |

| **Long-Term Memory** | Cross-session | Vector DB / SQL / Graph | User preferences, past conversation summaries, persistent knowledge. |

---

1. Scratch Reasoning: Context Management

Reasoning tokens (like Chain-of-Thought) help the model but increase token costs without providing lasting value.

* **Eviction Policy**: Keep only the **current** turn's reasoning.

* **Summarization**: Drop prior turns' reasoning during context compression. It is rarely needed for future turns.

* **Telemetry**: Log full reasoning to an external observer/telemetry store for debugging, rather than keeping it in the model's active context window.

2. Tool History: Rolling Summarization

The standard policy of "drop the oldest messages" when context fills up is often a failure mode, as it can delete the user's original goal.

**Policy: Summarization by Age with Pinned Goal**

1. **Pin the Goal**: Always keep the initial user instruction in the prompt.

2. **Rolling Summary**: When context reaches a threshold (e.g., 50%), summarize turns 1 through $N-5$ into a concise paragraph, preserving only current turns ($N-4$ to $N$) in full fidelity.

3. **Preserve Entities**: Ensure the summarization prompt instructs the model to retain IDs, names, and specific values exactly.

3. Working Memory: Fact Extraction

Working memory consists of facts the agent discovers that must drive future actions. These should be stored in structured "slots" rather than prose.

* **Pattern**: Use a JSON block in the system prompt for high-signal facts (e.g., `account_id`, `current_step`).

* **Update Mechanism**: The agent uses a dedicated `update_working_memory` tool when it discovers new facts.

* **Benefit**: This block survives all summarization and allows the orchestrator to monitor progress (or stall) programmatically.

4. Long-Term Memory: Storage Selection

Selecting the right substrate for long-term memory is critical for retrieval quality.

| Use Case | Substrate |

|---|---|

| **Semantic Recall** | Vector Database (Fuzzy match on past interactions) |

| **User Preferences** | SQL Database (Typed columns for `timezone`, `persona`) |

| **Exact Recall** | Transaction Log / Audit DB (e.g., "Was refund #123 issued?") |

| **Relational Knowledge** | Knowledge Graph (Mapping entities and their relations) |

Forgetting and Retention

Unbounded memory is an anti-pattern. Every channel needs a retention policy:

* **Privacy**: Implement a deletion path for GDPR/compliance (deleting specific user nodes/vectors).

* **TTL**: Apply Time-To-Live to facts that may become stale (e.g., `current_location`).

Cross-Session Continuity

A minimal continuity stack requires:

1. **User Context**: Injected into every turn.

2. **Session Summary**: The last $N$ turns of the previous session summarized and loaded at startup.

3. **Preference Injection**: Structured facts ("prefers JSON output") placed in the system prompt.

Verification

Memory quality should be measured via automated evals:

* **Goal Retention**: Does the final action match the initial instruction after 15+ turns?

* **Fact Persistence**: Can the agent recall a ticket ID provided 10 turns ago?

* **Recall Latency**: Monitor the time added by vector retrieval vs. the quality gain.