V — The landscape · 6 min

Frequently Asked Questions

Short, honest answers to the questions builders actually ask about AI memory, each pointing to the page that goes deep.

These are the questions that keep coming up in the threads where people build and argue about memory, answered straight. Each one points to the page that takes it further.

#Is AI memory a real problem, or just hype?

It depends entirely on who you are, and the community segments cleanly on this. Casual users who treat AI as a smarter search engine (ask, read, close the tab) rarely feel the need, and for them a decent context window plus good prompting is enough. The wall appears with multi-session work like a codebase or a research project that evolves over weeks, with agents that act over time, and with personalisation that has to persist across tools. One commenter framed the real failure well: it's not that the model forgets mid-sentence, it's that it forgets the last ten sessions ever happened, so it re-asks settled questions and reintroduces old bugs. There is a genuine bear case too, since memory is easy to prototype and looks sophisticated, which has produced a wave of near-identical "memory layer" projects and a fatigued, sceptical audience. See what AI memory is for the breakdown by user type.

#Won't bigger context windows (1M+ tokens) just solve it?

No, and the reason is not the size of the window. Treating context as memory is like treating RAM as a hard drive: it's volatile, expensive, and gets slower the more you fill it. Models also don't read a full window evenly, so stuffing more in can lower the odds the relevant part actually gets used. The binding constraint is salience, picking the few ideas that matter out of everything loaded, not raw capacity. The honest counterpoint is that one camp bets cheaper 10M-plus windows will erode the problem over the next few years, and they may be partly right; until then a useful rule is that under roughly 200K tokens you should just put it all in context, and memory earns its complexity only above that. See context vs memory.

#Isn't memory just a vector DB? Isn't this RAG with extra steps?

Retrieval is a component of memory, not the whole of it. A vector database hands back the chunk that looks most like your query, which isn't the same as the fact that's currently true. Take the canonical example: you loved Adidas on day 1, they fell apart on day 30, you switched to Puma on day 31, and on day 45 you ask what sneakers to buy. Pure retrieval returns "I love Adidas" because it scores highest against the literal question, and it's wrong. Memory is that retrieval step plus the state machinery around it: deciding what is allowed to become a fact, when an old fact stops being true, and whether a true fact is even worth surfacing now. See RAG vs memory.

#Why does my agent keep re-answering resolved questions and resurfacing stale facts?

These look like one bug but are two. Stale facts come from a vector store flattening old and new together (Alice moved to SF, then later works at Stripe) with no update mechanism, so it appends new chunks on top of old ones instead of overriding; the fix is structured or temporal records that supersede rather than pile up. The resolved-versus-relevant bug is different: a closed ticket and an open one are semantically identical to a vector DB, so the agent re-injects settled context, and the fix one builder reported was a resolution-state field rather than a smarter embedding. Watch for recency boosts that quietly make this worse on a tool whose whole job is recalling old context. See failure modes and temporal memory.

#Should I just use markdown files?

Often yes, and starting there is good advice rather than a cop-out. If you have 50 to 100 key facts, plain text files (a MEMORY.md for durable facts plus dated episodic logs, all git-tracked) work fine, and you can read them, debug them, and see how the agent's understanding evolved. It breaks down on four things: portability across different tools, cross-device sync, auto-capture (you have to remember to write things down yourself), and scale past a few hundred facts, where a quick search before each session beats loading everything. See the landscape for where the file floor ends and a product starts to pay off.

#How do I move my memory from ChatGPT to Claude or Gemini without losing it?

First-party memory is per-app and effectively locked in, so switching platforms means starting over. The durable path is to own the store yourself and expose it to every client over the Model Context Protocol (MCP), so the same memory mounts into ChatGPT, Claude, or Cursor on demand. For raw history, export it first (a chat-to-markdown converter helps), but be realistic about scale: a 20M-token export can't be pasted back in one shot, so the approach that works is to chunk the conversations, index them, and inject only the relevant pieces into new chats. See memory across tools.

#Mem0 vs Zep vs Letta vs Graphiti vs Cognee: which one?

Pick by use case, not by a leaderboard. Roughly: mem0 for user-level personalisation and session memory, Zep and its Graphiti layer for temporal, evolving facts and conversational knowledge graphs, Letta and MemGPT for self-editing agent memory, and Cognee for building a knowledge graph out of documents. Two caveats keep you honest. Benchmark wars are real (a Letta engineer noted that Gemini 2.5 Flash alone reportedly reaches 72.8% on LoCoMo, so a strong base model already rivals dedicated systems on the popular benchmark), and as one practitioner put it, if you can't make this decision yourself, you probably don't need GraphRAG. See the landscape for the full comparison table.

#Why does my D&D bot forget NPCs from five sessions ago?

That is episodic memory failing, the record of what happened in earlier sessions. The model only ever sees its current context window, so once those early sessions scroll out the NPCs go with them, and nothing is persisting the events to an external store and pulling them back when they become relevant. The fix is to capture each session's events, store them outside the prompt, and retrieve the ones that matter (the tavern bartender resurfaces when the party walks back into that tavern). See what AI memory is for the memory-type taxonomy, and forgetting for keeping the episodic store from bloating as the campaign grows.

#References

r/ArtificialInteligence, "Is AI memory a real problem or just bloated up?" (the casual-vs-power-user segmentation and the bear case, as community opinion).
r/LocalLLaMA, "The 'Infinite Context' Trap" (the RAM-vs-hard-drive framing and the salience bottleneck, with skeptical counter-takes).
Supermemory, "Memory vs RAG" concept docs, github.com/supermemoryai/supermemory (the Adidas-to-Puma illustration).
r/LLMDevs, "RAG has not felt like enough for agent memory" (resolved-vs-relevant and the resolution-state field, reported as practitioner experience).
r/LocalLLaMA, "Are vector databases fundamentally insufficient for long-term LLM memory?" (flat text with no update mechanism; the Alice-to-Stripe example).
r/ChatGPT, "I gave an AI agent persistent memory using just markdown files" (the MEMORY.md plus episodic-logs pattern and where it fits).
r/ChatGPT, "I built ChatGPT a 'Save Game' feature" (the portability and ownership argument; "you are renting your intelligence").
r/LocalLLaMA, "Letta vs Mem0" (the Gemini 2.5 Flash 72.8% LoCoMo data point and the benchmark-integrity subtext, as community claims).
r/Rag, "Cognee vs Graphiti vs Mem0" (the use-case split and the "you probably don't need GraphRAG" gatekeeping).