AI Memory
II — From retrieval to memory · 8 min

Memory Suggestions

How a curator LLM proposes deletes, merges, and reconciliations over your memories while nothing changes until you accept, plus the ID-validation, zero-information-loss, and idempotency guards that make a model safe to point at a user's memory store.

Give an assistant memory, let it run for a month, and the store rots. You end up with four near-identical entries that all say you greeted it with "hi", a "works at Google" that has been wrong since you changed jobs, a stray "[new memory]" saved by accident, and a literal \))_. This is the single most common complaint about AI memory tools: it goes stale, it becomes a mess, contradictions pile up. The obvious fix is to point an LLM at the store and let it tidy up. The obvious fix is also how you lose data, because the same model that merges two duplicates will happily delete the one memory recording "37 million tokens since January 2023" on the grounds that it "doesn't fit with the others."

So the design problem is not "can an LLM clean memory." It can. The problem is doing it without ever silently destroying something the user cared about. MemoryPlugin's answer is a curator that proposes and a human that disposes: an offline model suggests edits, and nothing touches the store until you accept.

#What the curator proposes

The curator works one bucket (one memory folder) at a time. For each memory it pulls the handful of nearest neighbours by vector similarity to form a small cluster of plausibly related memories, then hands that cluster to the model and asks whether anything is worth doing. One honest detail from building this: the raw similarity score turned out to be useless as a gate. Voyage embeddings' score distribution sits so high and so compressed that a 0.7-to-0.95 cutoff (the kind of threshold band mem0's original design used) does not separate duplicates from unrelated text. The threshold was effectively turned off. Recall is governed by rank (the top-k neighbours) and precision by the model's own judgment, not by a number nobody can calibrate.

The model can return one of three operations, or, just as often, nothing:

  • DELETE removes junk: the empty strings, the accidental "[new memory]", the "greeted with hi" social noise. These are not only clutter. Garbage entries degrade vector search, and worse, they give a safety-trained assistant a concrete reason to refuse the whole memory protocol.
  • DELETE_AND_COMBINE merges two to five same-topic near-duplicates into one. The merge keeps the oldest memory as the anchor and deletes the newer copies, so the created-at timestamp stays stable.
  • DELETE_AND_UPDATE reconciles information that has evolved or contradicts itself. "Works at Google" becomes "worked at Google, now works at Stripe." The contradiction is preserved as a progression ("was X, now Y"), not silently resolved by picking a winner.
HUMAN-IN-THE-LOOP · nothing changes without approval Similar memories cluster LLM curator reviews inert · not yet applied Proposed edits DELETE COMBINE UPDATE AcceptEditReject Applied memory updated Discarded no change accept · edit reject
Figure 1. A curator clusters similar memories and proposes DELETE, COMBINE, or UPDATE operations; each proposal stays inert until the user accepts (optionally editing the text), rejects, or dismisses it.

This is a deliberate contrast with the school of thought mem0 made famous. mem0's classic design ran a second LLM call at write time that diffed each new fact against existing memory and emitted ADD, UPDATE, DELETE, or NONE. In version 3 mem0 abandoned it: the pipeline now appends only and lets read-time ranking surface the current fact, reporting roughly +20 points on LoCoMo and +26 on LongMemEval with extraction about twice as fast. Their conclusion: asking a model to maintain a consistent store on every write is harder, slower, and lossier than accumulating everything and ranking well. MemoryPlugin makes a different bet, keeping the explicit store clean but moving cleanup off the write path and behind human approval. Neither is universally right (see updates and conflicts for the full fork).

#Nothing changes until you accept

A suggestion is inert. It is a proposed operation sitting in a queue, not a change. The store is mutated only at the moment you accept, and even then you can edit the merged text first, so the model's wording is a draft you can overrule. Reject or dismiss a suggestion and that decision is remembered, so the curator does not waste your time re-proposing it on the next run. Deletes are soft (a flag, a reason, and a pointer to what a memory was merged into), never destructive, so an accepted merge can be traced and undone.

This is the storage-versus-admission split from RAG vs memory turned into a product surface. The model is allowed to propose a memory operation; it is not allowed to bless one. The human is the admission control.

#Defending against the model

Approval is the outer safety net, and it is not enough on its own, because a human approving a hundred suggestions in a row will rubber-stamp. Three guards make the suggestions safe to approve in the first place, and each one generalises to any system that lets an LLM touch a user's data.

#Hallucinated IDs

Ask a model to return the IDs of memories to delete and it will, reliably, invent some. The defence is two layers. First, every ID the model returns is validated against the exact set of IDs that went into the prompt; if a suggestion references an ID that was not in its input, the entire suggestion is dropped rather than partially applied. Second, before any write, every ID is re-checked for ownership: it must resolve to a row owned by this user and sitting in this bucket, or the operation throws. mem0 adds a third trick worth copying: never show the model real UUIDs at all. Present the candidates as "0", "1", "2" and map back afterwards, so there is nothing realistic to hallucinate.

#Zero information loss

The instruction that matters most is "you are an editor, not a writer." The curator operates under a preservation contract: never drop temporal information (since 2023, durations), quantities (37 million tokens, $10k MRR), specific identifiers (names, versions, URLs), current state (currently broken, in progress), causal context (because of X), or contradictions (encode "was X, now Y", do not choose a side). That list is scar tissue. Early versions deleted memories carrying usage stats and stripped "since 2023" off merges. The fix that stuck was general principles rather than a blocklist of past mistakes, because an overfitted rule for every previous failure makes a brittle prompt that misses the next one. (This is the curation-time mirror of the extraction contract in smart extraction.)

#Idempotency

Run the curator twice and it must not pile up duplicate suggestions or re-litigate settled ones. Offline, a skip set excludes anything already covered by a live pending suggestion, plus memories too large to reason about safely; and within a single run, every memory consumed by one suggestion is barred from appearing in another, so each memory lands in at most one operation. Online, if a memory is edited or moved by any other part of the app, every pending suggestion that references it is invalidated, so you can never accept a merge built on text that has since changed. A per-user concurrency key stops two runs racing on the same store.

#The honest cost

This is the expensive way to keep memory clean. It runs an LLM over clusters of memories offline, it needs a human in the loop, and the suggestion queue is real work that piles up if you ignore it. If your memory is invisible plumbing the user never inspects, mem0's append-and-rank approach is probably the better trade: cheaper, fully automatic, no queue. The case for suggestions is strongest when the store is something the user sees, trusts, and carries across tools. Then the store being correct and legible matters more than it being effortless, and "the model can propose but only you can commit" is the contract that earns the trust.

#References

  • mem0, DEFAULT_UPDATE_MEMORY_PROMPT (the ADD / UPDATE / DELETE / NONE write-time schema, and the integer-ID indirection that prevents UUID hallucination), github.com/mem0ai/mem0.
  • mem0, OSS v2-to-v3 migration notes (the move from two-phase write-time consolidation to single-pass append-only with read-time ranking; reported LoCoMo +20 and LongMemEval +26 with roughly half the extraction latency), github.com/mem0ai/mem0.
  • MemoryPlugin memory-suggestions curator (the DELETE / DELETE_AND_COMBINE / DELETE_AND_UPDATE operations, the inert-until-accept UX, the zero-information-loss contract, and the ID-validation and idempotency guards), described here as one production system.