The Memory Tool Landscape
A neutral, use-case-keyed map of the real AI-memory tools, split into corpus and agent memory families, with the DIY markdown floor and an honest caveat about the benchmark wars.
You have read how the pieces work. The question that follows is the one every builder actually asks: which of these tools should I use? The honest answer is that it depends on the question you need answered, and nothing here wins everywhere. A system tuned to summarise a 10,000-page corpus is not the system you want tracking that a user switched from Adidas to Puma last week. This page is a map, not a ranking. It sorts the field into two families, places the real tools on a structure axis, and is blunt about where the leaderboards mislead.
#Two families that the field keeps conflating
The word "memory" covers two overlapping jobs. Corpus memory turns a static document set into a structured index you can reason over: ingest a corpus once, build a graph or a tree, then answer questions no single chunk contains. GraphRAG, HippoRAG, RAPTOR, and LightRAG live here. Agent memory accumulates, updates, and recalls facts about a user or agent across many sessions: it has to handle a fact changing, being superseded, or needing to be forgotten. mem0, Zep/Graphiti, Letta, LangMem, and A-MEM live here. The two families share machinery (embeddings, chunking, hybrid retrieval) but optimise for different failure modes, which is why a corpus tool feels wrong as a personalisation layer and vice versa. Cognee straddles the line: it builds a knowledge graph out of documents, but with the incremental updating and ontology grounding that agent-style memory needs.
The second axis is how much structure a tool imposes at write time, from flat vectors at the bottom to a full graph at the top. More structure buys multi-hop and temporal reasoning; it costs more to build and keep in sync.
#A use-case-keyed comparison
Read this by your use case, not top to bottom. The "approach" column is what the tool actually does; the "best fit" column is the question it answers well. None of this implies a quality ordering.
| Tool | Family | Approach | Best fit |
|---|---|---|---|
| mem0 | agent | Lean memory CRUD; v3 is single-pass ADD-only with conflict resolution moved to read-time ranking | User-level personalisation and session memory, not deep document corpora |
| Zep / Graphiti | agent | Bi-temporal knowledge graph; contradictions invalidate edges rather than delete them | Evolving facts where "what was true when" matters (conversational KGs) |
| Letta / MemGPT | agent | OS-style tiered memory the agent self-edits via tool calls | Autonomous agents that curate their own working memory |
| LangMem | agent | Primitives, not a store: semantic / episodic / procedural memory on LangGraph | Building your own memory layer with composable parts |
| A-MEM | agent | Self-organising Zettelkasten notes; a new note can rewrite its neighbours | Emergent, self-evolving note memory (research-stage) |
| supermemory | agent (+ RAG) | Memory API plus a transparent proxy; auto-maintained static/dynamic user profiles | Drop-in memory for an app, via SDK or a base-URL swap |
| GraphRAG | corpus | Entity graph plus Leiden communities, each summarised by an LLM | Global "sensemaking" over a fixed corpus ("what are the themes?") |
| HippoRAG | corpus | OpenIE triples plus Personalized PageRank for one-step multi-hop | Cheap multi-hop QA over a corpus, far lighter to index than GraphRAG |
| RAPTOR | corpus | Recursive embed/cluster/summarise into a tree; query every level | Hierarchical document QA needing both detail and theme |
| LightRAG | corpus | Dual-level (entity + theme) graph RAG with incremental updates | Graph RAG over documents that change without full re-indexing |
| Cognee | both | Extract/Cognify/Load pipeline; triples validated against an ontology | Automated KG construction from heterogeneous documents |
| MemoryPlugin | agent | Cross-tool consumer memory plus background chat-history sync, surfaced over MCP | Portable memory across ChatGPT, Claude, and Gemini, with chat history searchable on demand |
A few honest contrasts behind the table. mem0 and Graphiti sit at opposite ends of the conflict-resolution fork: mem0 concluded that diffing every new fact against old memory at write time was slower and lower-quality than appending and ranking at read time, while Graphiti keeps a full bi-temporal history so it can answer time-travel queries (see updates and conflicts and temporal memory). Letta hands curation to the model itself, so memory quality tracks the model's judgment; the background-pipeline tools are more predictable but more rigid. On the corpus side, GraphRAG's per-community summaries unlock global questions but inject text that can hurt simple factual recall, which is exactly what HippoRAG was built to avoid (knowledge graphs develops this).
#The markdown floor
Before any of these, there is a floor worth taking seriously: plain files. A popular pattern (Andrej Karpathy's "LLM wiki" is the usual anchor) is a MEMORY.md for durable facts, a TASKS.md for current priorities, and dated episodic logs, all read at session start and all git-tracked. It is genuinely good for a reason builders repeat: with fifty to a hundred key facts, you can read the memory, debug it, and version-control it, and a vector database is overkill. A common refinement is two tiers, raw daily logs distilled periodically into the main file, with a light embedding search added only once you pass twenty-odd files.
The floor breaks in four predictable places, and they map onto what the products sell: portability across tools (files live in one client), cross-device sync, scale beyond a few hundred facts (linear reads stop being free), and auto-capture (you have to remember to write everything down, which defeats the point). That last one is the wedge for chat-history memory: you cannot reliably predict in advance which conversation will matter later, so capturing it automatically beats hand-curated notes.
#On the benchmark wars
The leaderboards are where credibility goes to die, so treat them carefully.
There are two further reasons to discount single-number claims. The benchmarks themselves are contested: a Letta engineer's much-quoted line is that "memory is not retrieval, memory is active management of context, and LoCoMo is simply not designed for that," and vendors have publicly accused each other of unfair benchmark setups. And the cost axis is usually omitted: one builder reported that running an extraction LLM on every message made memory systems roughly 14 to 77 times more expensive and about 30% less accurate than just passing full history, for working-state recall (a community claim, not a peer-reviewed result, and it depends entirely on whether you run an LLM on every write). The right move is to report the triple of accuracy, latency, and context tokens together, which the evaluating page covers.
#How to choose
Start from the symptom. If facts about a user change over time and stale ones keep resurfacing, you want agent memory with real supersession, which at the structured end is a temporal graph (Zep/Graphiti) and at the lean end is mem0 or supermemory for straightforward personalisation. If you need to reason over a fixed body of documents, you want corpus memory: GraphRAG for whole-corpus themes, HippoRAG for cheap multi-hop, RAPTOR for mixed detail-and-theme QA, LightRAG when the corpus keeps changing, Cognee when you want an ontology-grounded graph built for you. If you want the agent to manage its own memory as it works, that is Letta. If you are assembling your own stack, LangMem gives you primitives. And if the goal is to own one memory that follows you across ChatGPT, Claude, and Gemini, with past chats captured automatically, that is the cross-tool consumer case (across tools) that MemoryPlugin targets. Most teams need less structure than the demos suggest; match the tool to the question, and let the failure modes you actually hit pull you up the structure axis.
#References
- mem0, operation-based memory layer. Repo https://github.com/mem0ai/mem0 ; paper https://arxiv.org/abs/2504.19413 ; the v2-to-v3 single-pass redesign is documented in the repo's
oss-v2-to-v3migration notes. - Zep / Graphiti, bi-temporal knowledge graph. Repo https://github.com/getzep/graphiti ; Zep paper https://arxiv.org/abs/2501.13956
- Letta / MemGPT, OS-style self-editing memory. Repo https://github.com/letta-ai/letta ; MemGPT paper https://arxiv.org/abs/2310.08560
- Microsoft GraphRAG. Repo https://github.com/microsoft/graphrag ; paper https://arxiv.org/abs/2404.16130
- HippoRAG / HippoRAG 2. Repo https://github.com/OSU-NLP-Group/HippoRAG ; papers https://arxiv.org/abs/2405.14831 and https://arxiv.org/abs/2502.14802
- RAPTOR, recursive summary tree. Repo https://github.com/parthsarthi03/raptor ; paper https://arxiv.org/abs/2401.18059
- LightRAG, incremental dual-level graph RAG. Repo https://github.com/HKUDS/LightRAG ; paper https://arxiv.org/abs/2410.05779
- Cognee, ontology-grounded ECL pipeline. Repo https://github.com/topoteretes/cognee
- LangMem (LangChain), memory primitives SDK. Docs https://langchain-ai.github.io/langmem/
- A-MEM, self-organising agentic memory. Repo https://github.com/agiresearch/A-mem ; paper https://arxiv.org/abs/2502.12110
- supermemory, memory API and proxy. Repo https://github.com/supermemoryai/supermemory
- r/Rag, "Cognee vs Graphiti vs Mem0" and r/LocalLLaMA, "Letta vs Mem0" (use-case consensus, the LoCoMo critique, and benchmark-war context, reported as practitioner discussion).
- r/ChatGPT, "I gave an AI agent persistent memory using just markdown files" and r/LocalLLaMA, "persistent second brain" (the DIY markdown floor and the Karpathy "LLM wiki" anchor).