Start here · 4 min

The Field Guide to AI Memory

A working guide to how AI products actually remember: the external store, retrieval, curation, and ranking that turn a forgetful chatbot into one that knows you.

Most AI tools forget you the moment you close the tab. The next conversation starts from nothing: who you are, what you decided last week, the way you like your answers written. "Memory" is the set of techniques that fix this, and almost none of it is magic. It's retrieval, a few well-placed model calls, and a lot of unglamorous engineering around what to keep, what to throw away, and what to surface when it matters.

This guide explains how that machinery works, in plain language, with the real tradeoffs. It draws on building two memory-heavy products and on reading closely how the rest of the field does it.

#The whole field in one sentence

Strip away the branding and almost every memory system is the same four moving parts.

An external, editable store. A model's knowledge is frozen in its weights at training time. Memory lives outside the model, in a database you can read, write, correct, and delete between turns.
Retrieval. For each new turn, something decides which slivers of that store are relevant and pulls them into the prompt. Usually vector search, often vector search alongside keyword search.
Curation. Raw conversation is noisy. Curation is the deciding-what-is-worth-keeping work: extracting durable facts, merging near-duplicates, reconciling "I loved Adidas" with "Adidas broke, switched to Puma," and forgetting what's gone stale.
Ranking. When ten plausible memories compete, ordering decides which one the model actually sees. The closest-looking chunk is frequently the wrong one, and getting this order right is where most systems quietly live or die.

Get these four right and you have memory. Get retrieval right but skip curation and you have a search box that slowly fills with contradictions.

#Memory is not one thing

The most common mistake in this space is treating "memory" as a single feature you bolt on. A chatbot remembering your name, an agent tracking the state of a long-running task, and a D&D bot recalling an NPC from five sessions ago are three different problems, with different storage, different write triggers, and different ways of going wrong. The field usually splits them into working, episodic, semantic, and procedural memory, with document memory (plain RAG over your files) as a fifth. A system that nails one of these can be useless at another. That's why broad "universal LLM memory" claims deserve a hard look: most products are good at exactly one type and quiet about the rest.

The next page, what is AI memory, lays out the full taxonomy and the characteristic failure that comes with each type.

One warning to carry through the whole guide: the human-memory analogy ("just give it long-term memory like a person") is a useful intuition and a poor blueprint. Humans have distinct memory systems too, and forcing a one-to-one map onto software tends to hide the engineering that actually decides whether recall works.

#Who this is for

You're building something with an LLM and want it to remember across sessions. Or you're evaluating a memory product and want to know what's actually under the hood. Either way, you want the mechanics and the tradeoffs, not a pitch. Claims here are tied to a primary source or to a war story from a real production system, and the guide is honest about where each technique breaks.

#Two ways to read this

If the topic is new to you, go top to bottom. Part I builds the foundations: what memory is, why a bigger context window isn't a substitute, then RAG, embeddings, and chunking. Part II is where retrieval turns into real memory. Parts III through V cover architectures, building for production, and the tool landscape.

If you already know RAG and want the parts specific to memory, start with RAG vs memory, the distinction the field keeps rediscovering, then jump to whichever architecture you need: knowledge graphs, temporal memory, hierarchical memory, hybrid retrieval, or memory across tools.