II — From retrieval to memory · 7 min

Forgetting

Why a memory store has to forget on purpose, covering decay-based ranking, scheduled expiry, consolidation, OS-style eviction, and read-time deduplication, with the cost of each.

A store that only grows is a store that slowly stops working. Every conversation adds facts, and most of them are low value, many are near-duplicates, and a fair number were only true for a week. Keep all of it and you pay twice. The token budget fills with noise, and the model hits the accuracy cliffs that come from burying the one fact that mattered under fifty that did not. Forgetting is not a defect a memory system tolerates. It is a behaviour it has to implement on purpose, and it shows up in four distinct places: how you rank, when you expire, how you compact, and what you suppress at read time.

#Ranking with decay, not just similarity

The most-copied recipe for this comes from the Generative Agents paper, which scored each memory in its stream by a weighted sum of three normalised signals: relevance (cosine similarity to the current query), importance, and recency. Recency is an exponential decay, with the paper using a factor of 0.995 per step, so a memory's pull fades a little each time the clock advances unless something refreshes it. Importance is assigned at write time by asking the model to rate the memory's poignancy on a 1 to 10 scale (1 for brushing your teeth, 10 for a breakup), which keeps trivia from drowning out the events that actually shaped the user. Pure cosine similarity is not a ranking. A memory can be the closest match and still be the wrong one to surface because it is old or because it never mattered.

Decay is a useful default, but it is also where a recency bias can quietly invert your product. If the whole point of the tool is to surface old context, a multiplier that rewards "newer" will systematically lose the year-old conversation that holds the answer to a fresh one that does not. Recency belongs in the score; it does not belong there unconditionally.

#Forgetting as a scheduled property

Some facts arrive with an expiry date attached. "I have an exam tomorrow" is worth remembering today and worth nothing the day after. Rather than wait for a cleanup pass to notice it has gone stale, supermemory makes forgetting a stored field on the memory itself: forgetAfter (a date), forgetReason (why), and a soft isForgotten flag. You decide at write time when a memory should lapse, and after that date it is dropped from normal recall without anyone running a sweep.

Two design choices in that scheme are worth copying. First, forgetting is a soft delete, not a hard one: isForgotten=true hides the memory but keeps it, so the decision is reversible and auditable. Second, an explicit "forget this" command is guarded against nuking the wrong thing. Supermemory tries an exact content match first, and only falls back to a semantic search with a deliberately high 0.85 similarity threshold (limit five), and it refuses to forget raw document chunks through a memory-forget call at all. The lesson generalises past their implementation: a destructive operation driven by fuzzy matching needs a high bar and a narrow blast radius, or "forget my dentist appointment" eventually deletes your dentist.

#Consolidate, evolve, forget

Scheduled expiry handles the facts you knew were temporary. The larger problem is the slow accumulation of raw, redundant, half-superseded logs that nobody flagged. A lifecycle the community keeps converging on names three jobs: consolidate noisy logs into durable facts, evolve those facts as they merge and update, and forget what no longer earns its place. The pieces map onto things this guide covers elsewhere, but forgetting is the stage that closes the loop and keeps the store from bloating.

Figure 1. Capture, consolidate, evolve, forget. Raw observations become facts, facts merge and supersede, and a decay curve lowers a memory's weight over time until it is pruned or reinforced.

Consolidation is also straight from Generative Agents, under the name reflection. When the summed importance of recent memories crosses a threshold (around 150 in the paper, roughly two or three times a day), the agent retrieves what it has seen and synthesises higher-level insights, writing those back into the stream as new nodes. That is the move from episodic to semantic memory: raw "did X, then Y" logs become a compact "the user prefers Z," and the originals can decay without losing the conclusion they supported. Letta runs the same idea on a schedule, with a separate background "sleep-time" agent whose only job is to reorganise memory blocks while the main agent is idle. Its instructions carry a hygiene rule worth stealing: never write "today" or "recently," always write absolute dates, because the memory is persisted indefinitely and a relative timestamp rots the moment it is stored.

#Evicting working memory

The forgetting above operates on a long-term store. The context window has its own, more urgent version of the problem, and MemGPT's operating-system analogy is the clearest treatment of it. Treat the window as RAM and an external store as disk, then page between them. Two thresholds drive eviction: at 70% of the window a warning fires, and at 100% the system flushes roughly half the messages and folds them, plus the previous summary, into a new recursive summary. The warning is the interesting part. It is a system message to the model itself, telling it that eviction is imminent and that it should save anything important to core or archival memory first, before the raw messages are gone. Forgetting here is explicit and cooperative: the model gets a chance to decide what survives the compaction rather than having it chosen blindly by position.

#Read-time forgetting with MMR

Not everything you forget needs to leave the store. Sometimes you just need to stop returning five versions of the same fact in one response. Maximal Marginal Relevance (MMR) is the standard tool for this: when assembling results, it trades pure relevance against diversity so that each new item has to add something the already-selected ones do not. Graphiti exposes it as a reranker with a default lambda of 0.5, an even split between "most relevant" and "least redundant." A practitioner on r/RAG drew the sharp distinction: MMR suppresses near-duplicate competitors passively, at retrieval time, for one query, while a store-level forgetting mechanism reshapes what exists over time. You generally want both. MMR keeps a single answer clean; consolidation and expiry keep the store from needing MMR to do too much.

#Borrowing from human memory, carefully

It is tempting to model all of this on human forgetting, and the analogies are genuinely suggestive. One ambitious project implemented reconsolidation, the finding that a recalled memory drifts slightly (the author used about 5% toward current mood) each time you retrieve it, so memories literally change when you remember them. The same project noted that retrieval-induced forgetting, where recalling one memory suppresses its competitors, is the cognitive-science cousin of MMR. The catch, raised in the same threads, is that a property which makes human memory adaptive makes an AI memory unreliable. A system you built for accurate recall should not let stored facts drift 5% per read, and several builders argue plainly that mapping LLM memory onto human memory is a category error: humans have distinct memory systems, and AI memory needs to be intentionally managed rather than left to behave like a brain. Take the mechanisms that serve recall (decay, consolidation, deduplication) and leave the ones that quietly corrupt it.

#References

Park et al., 2023, "Generative Agents: Interactive Simulacra of Human Behavior," arxiv.org/abs/2304.03442 (the recency-importance-relevance ranking, 0.995 decay, 1 to 10 importance, and reflection as consolidation).
Packer et al., 2023, "MemGPT: Towards LLMs as Operating Systems," arxiv.org/abs/2310.08560 (the 70% warning / 100% flush eviction model with recursive summarisation).
Supermemory, memory schema and forgetting logic, github.com/supermemoryai/supermemory (forgetAfter / forgetReason / soft isForgotten, and the exact-then-0.85-semantic forget guard).
Graphiti (Zep), hybrid search and rerankers, github.com/getzep/graphiti (MMR reranker, default lambda 0.5).
r/LocalLLaMA, "The 'Infinite Context' Trap" (the consolidate, evolve, forget lifecycle framing).
r/RAG, "I got tired of RAG and spent a year implementing the neuroscience of memory instead" (reconsolidation drift, retrieval-induced forgetting as MMR's cousin, and the human-memory caution).