V — The landscape · 9 min

The Memory Tool Landscape

A neutral, use-case-keyed map of the real AI-memory tools, split into corpus and agent memory families, with the DIY markdown floor and an honest caveat about the benchmark wars.

You have read how the pieces work. The question that follows is the one every builder actually asks: which of these tools should I use? The honest answer is that it depends on the question you need answered, and nothing here wins everywhere. A system tuned to summarise a 10,000-page corpus is not the system you want tracking that a user switched from Adidas to Puma last week. This page is a map, not a ranking. It sorts the field into two families, places the real tools on a structure axis, and is blunt about where the leaderboards mislead.

#Two families that the field keeps conflating

The word "memory" covers two overlapping jobs. Corpus memory turns a static document set into a structured index you can reason over: ingest a corpus once, build a graph or a tree, then answer questions no single chunk contains. GraphRAG, HippoRAG, RAPTOR, and LightRAG live here. Agent memory accumulates, updates, and recalls facts about a user or agent across many sessions: it has to handle a fact changing, being superseded, or needing to be forgotten. mem0, Zep/Graphiti, Letta, LangMem, and A-MEM live here. The two families share machinery (embeddings, chunking, hybrid retrieval) but optimise for different failure modes, which is why a corpus tool feels wrong as a personalisation layer and vice versa. Cognee straddles the line: it builds a knowledge graph out of documents, but with the incremental updating and ontology grounding that agent-style memory needs.

The second axis is how much structure a tool imposes at write time, from flat vectors at the bottom to a full graph at the top. More structure buys multi-hop and temporal reasoning; it costs more to build and keep in sync.

Figure 1. The landscape on two axes: corpus versus agent memory (horizontal) and flat versus graph structure (vertical). Plain vector RAG and DIY markdown anchor the flat corners; temporal graphs sit at the top.

#A use-case-keyed comparison

Read this by your use case, not top to bottom. The "approach" column is what the tool actually does; the "best fit" column is the question it answers well. None of this implies a quality ordering.

Tool	Family	Approach	Best fit
mem0	agent	Lean memory CRUD; v3 is single-pass ADD-only with conflict resolution moved to read-time ranking	User-level personalisation and session memory, not deep document corpora
Zep / Graphiti	agent	Bi-temporal knowledge graph; contradictions invalidate edges rather than delete them	Evolving facts where "what was true when" matters (conversational KGs)
Letta / MemGPT	agent	OS-style tiered memory the agent self-edits via tool calls	Autonomous agents that curate their own working memory
LangMem	agent	Primitives, not a store: semantic / episodic / procedural memory on LangGraph	Building your own memory layer with composable parts
A-MEM	agent	Self-organising Zettelkasten notes; a new note can rewrite its neighbours	Emergent, self-evolving note memory (research-stage)
supermemory	agent (+ RAG)	Memory API plus a transparent proxy; auto-maintained static/dynamic user profiles	Drop-in memory for an app, via SDK or a base-URL swap
GraphRAG	corpus	Entity graph plus Leiden communities, each summarised by an LLM	Global "sensemaking" over a fixed corpus ("what are the themes?")
HippoRAG	corpus	OpenIE triples plus Personalized PageRank for one-step multi-hop	Cheap multi-hop QA over a corpus, far lighter to index than GraphRAG
RAPTOR	corpus	Recursive embed/cluster/summarise into a tree; query every level	Hierarchical document QA needing both detail and theme
LightRAG	corpus	Dual-level (entity + theme) graph RAG with incremental updates	Graph RAG over documents that change without full re-indexing
Cognee	both	Extract/Cognify/Load pipeline; triples validated against an ontology	Automated KG construction from heterogeneous documents
MemoryPlugin	agent	Cross-tool consumer memory plus background chat-history sync, surfaced over MCP	Portable memory across ChatGPT, Claude, and Gemini, with chat history searchable on demand

A few honest contrasts behind the table. mem0 and Graphiti sit at opposite ends of the conflict-resolution fork: mem0 concluded that diffing every new fact against old memory at write time was slower and lower-quality than appending and ranking at read time, while Graphiti keeps a full bi-temporal history so it can answer time-travel queries (see updates and conflicts and temporal memory). Letta hands curation to the model itself, so memory quality tracks the model's judgment; the background-pipeline tools are more predictable but more rigid. On the corpus side, GraphRAG's per-community summaries unlock global questions but inject text that can hurt simple factual recall, which is exactly what HippoRAG was built to avoid (knowledge graphs develops this).

#The markdown floor

Before any of these, there is a floor worth taking seriously: plain files. A popular pattern (Andrej Karpathy's "LLM wiki" is the usual anchor) is a MEMORY.md for durable facts, a TASKS.md for current priorities, and dated episodic logs, all read at session start and all git-tracked. It is genuinely good for a reason builders repeat: with fifty to a hundred key facts, you can read the memory, debug it, and version-control it, and a vector database is overkill. A common refinement is two tiers, raw daily logs distilled periodically into the main file, with a light embedding search added only once you pass twenty-odd files.

The floor breaks in four predictable places, and they map onto what the products sell: portability across tools (files live in one client), cross-device sync, scale beyond a few hundred facts (linear reads stop being free), and auto-capture (you have to remember to write everything down, which defeats the point). That last one is the wedge for chat-history memory: you cannot reliably predict in advance which conversation will matter later, so capturing it automatically beats hand-curated notes.

#On the benchmark wars

The leaderboards are where credibility goes to die, so treat them carefully.

There are two further reasons to discount single-number claims. The benchmarks themselves are contested: a Letta engineer's much-quoted line is that "memory is not retrieval, memory is active management of context, and LoCoMo is simply not designed for that," and vendors have publicly accused each other of unfair benchmark setups. And the cost axis is usually omitted: one builder reported that running an extraction LLM on every message made memory systems roughly 14 to 77 times more expensive and about 30% less accurate than just passing full history, for working-state recall (a community claim, not a peer-reviewed result, and it depends entirely on whether you run an LLM on every write). The right move is to report the triple of accuracy, latency, and context tokens together, which the evaluating page covers.

#How to choose

Start from the symptom. If facts about a user change over time and stale ones keep resurfacing, you want agent memory with real supersession, which at the structured end is a temporal graph (Zep/Graphiti) and at the lean end is mem0 or supermemory for straightforward personalisation. If you need to reason over a fixed body of documents, you want corpus memory: GraphRAG for whole-corpus themes, HippoRAG for cheap multi-hop, RAPTOR for mixed detail-and-theme QA, LightRAG when the corpus keeps changing, Cognee when you want an ontology-grounded graph built for you. If you want the agent to manage its own memory as it works, that is Letta. If you are assembling your own stack, LangMem gives you primitives. And if the goal is to own one memory that follows you across ChatGPT, Claude, and Gemini, with past chats captured automatically, that is the cross-tool consumer case (across tools) that MemoryPlugin targets. Most teams need less structure than the demos suggest; match the tool to the question, and let the failure modes you actually hit pull you up the structure axis.

#References

mem0, operation-based memory layer. Repo https://github.com/mem0ai/mem0 ; paper https://arxiv.org/abs/2504.19413 ; the v2-to-v3 single-pass redesign is documented in the repo's oss-v2-to-v3 migration notes.
Zep / Graphiti, bi-temporal knowledge graph. Repo https://github.com/getzep/graphiti ; Zep paper https://arxiv.org/abs/2501.13956
Letta / MemGPT, OS-style self-editing memory. Repo https://github.com/letta-ai/letta ; MemGPT paper https://arxiv.org/abs/2310.08560
Microsoft GraphRAG. Repo https://github.com/microsoft/graphrag ; paper https://arxiv.org/abs/2404.16130
HippoRAG / HippoRAG 2. Repo https://github.com/OSU-NLP-Group/HippoRAG ; papers https://arxiv.org/abs/2405.14831 and https://arxiv.org/abs/2502.14802
RAPTOR, recursive summary tree. Repo https://github.com/parthsarthi03/raptor ; paper https://arxiv.org/abs/2401.18059
LightRAG, incremental dual-level graph RAG. Repo https://github.com/HKUDS/LightRAG ; paper https://arxiv.org/abs/2410.05779
Cognee, ontology-grounded ECL pipeline. Repo https://github.com/topoteretes/cognee
LangMem (LangChain), memory primitives SDK. Docs https://langchain-ai.github.io/langmem/
A-MEM, self-organising agentic memory. Repo https://github.com/agiresearch/A-mem ; paper https://arxiv.org/abs/2502.12110
supermemory, memory API and proxy. Repo https://github.com/supermemoryai/supermemory
r/Rag, "Cognee vs Graphiti vs Mem0" and r/LocalLLaMA, "Letta vs Mem0" (use-case consensus, the LoCoMo critique, and benchmark-war context, reported as practitioner discussion).
r/ChatGPT, "I gave an AI agent persistent memory using just markdown files" and r/LocalLLaMA, "persistent second brain" (the DIY markdown floor and the Karpathy "LLM wiki" anchor).