AI Memory
II — From retrieval to memory · 9 min

Updates and Conflicts

What happens when a new fact contradicts an old one, and the central design fork between resolving conflicts at write time and resolving them at read time.

In March a user tells the assistant they take almond milk in their coffee. In June they mention an almond sensitivity and say they have switched to oat. Both statements are now sitting in the store. Both embed close to "what does the user drink." The older one was true first and the newer one is true now, and something has to decide that the June fact wins. That decision is the quietest hard problem in memory systems, and the field is split on where to make it.

#The central fork: write time or read time

There are two places to reconcile a contradiction. The first is at write time: when a new fact arrives, compare it against what you already store and resolve the conflict immediately. Update the old record in place, mark it superseded, or merge a duplicate. The store stays clean, and every later read sees an already-settled view. The second is at read time: store the new fact next to the old one, change nothing on write, and let retrieval ranking surface the right one when somebody asks. The store accumulates, and the work of picking the current fact moves to query time.

Resolve at write time new fact LLM diffs vs existing memory ADD / UPDATE / DELETE one current store lossy, slower writes Resolve at read time new fact store keeps all versions rank at query current one surfaces append accumulates, ranking does the work
Figure 1. Two ways to reconcile a contradiction: resolve at write time by diffing against existing state, or accept everything and let read-time ranking surface the current fact.

Each side buys something and pays for it elsewhere. Write-time reconciliation keeps the store small and hands you a single current truth, but it spends an LLM call (sometimes several) on every write to diff new against old, and a wrong diff can silently delete or corrupt a fact you needed. Read-time reconciliation makes writes cheap and lossless, but it pushes the entire difficulty onto the ranker, which now has to pull the current fact out of a pile of contradictory ones every single query.

#mem0's reversal is the strongest data point

mem0 was known for the write-time design. Its classic pipeline made two LLM calls per turn: one to extract candidate facts, and a second that compared each fact against the top-K similar existing memories and emitted ADD, UPDATE, DELETE, or NONE. That second call was the conflict resolver the project was famous for.

In its V3 rewrite, mem0 deleted it. The add path is now a single pass that only ever ADDs. Their migration notes put the rationale plainly: "The model spends its capacity on understanding the input rather than diffing against existing state." Memories accumulate over time, and when information changes the new fact is simply stored alongside the old one while "retrieval handles ranking." They report a clear win: roughly +20 points on LoCoMo (71.4 to 91.6) and +26 on LongMemEval (67.8 to 93.4), with extraction latency cut about in half.

That's a measured argument that asking an LLM to maintain a conflict-free store on every write is harder, slower, and lower quality than appending everything and investing in retrieval. It's also conditional: read-time resolution only works if your ranking can float the current fact above the stale one. mem0 leans hard on temporal grounding during extraction (relative dates anchored to when the conversation happened) and on multi-signal scoring to make that true. With a weak ranker, append-only just means the stale fact resurfaces on its own schedule.

#You have to match before you can decide

Before any system can resolve a conflict, it has to notice that two memories are about the same thing. This matching step is where most of the engineering actually lives, and the approaches run from cheap and deterministic to expensive and smart.

  • Exact match. mem0's V3 dedup is an MD5 hash of the memory text: skip the insert if an identical string already exists. MemoryPlugin's write-time dedup is the same idea, an exact string match scoped to a single bucket. This catches only verbatim duplicates and is deliberately conservative. Letting a near-duplicate through is cheaper than risking a merge of two facts that only looked alike.
  • Semantic threshold. The intuitive next step is to embed both memories and merge them if cosine similarity clears a cutoff. The trap is calibration: similarity scores are not portable across embedding models, so a 0.85 cutoff borrowed from someone else's blog post is meaningless against your embedder's distribution (see the war story below).
  • Deterministic fuzzy, then LLM fallback. Graphiti resolves entities with exact normalized-name matching first, then an entropy gate that refuses to fuzzy-match short or low-information names ("the team"), then MinHash/LSH with a Jaccard threshold of 0.9 for the rest. Only genuinely ambiguous cases reach an LLM. Cheap filters clear the easy 90 percent and the expensive call handles the hard remainder.

#Invalidate, do not delete

Once two facts are matched and judged to contradict, the safest thing to do with the old one is to keep it. Graphiti is the clearest example. Every fact-edge carries a validity window, and a contradicting fact does not erase the old edge: it sets the old edge's invalid_at to the new fact's valid_at and leaves the record in place. The graph can still answer "what did we believe last Tuesday," and a delete that should not have happened cannot lose data that was never destroyed. There is a clean division of labour inside that step: the LLM decides whether two facts semantically contradict, and deterministic code does the interval arithmetic. You never hand date math to the model. Temporal memory goes deep on the bi-temporal model behind this.

#Versioned chains get history for free

Supermemory reaches the same "return current, preserve history" outcome with a lighter structure. Each memory row carries parentMemoryId, rootMemoryId, and an isLatest flag, so a preference that moves from Vue to React to React-with-TypeScript becomes a three-node chain where only the last node is flagged current. Reads return the latest; the superseded versions stay linked behind it.

PREFERENCE · frontend framework v1 Vue v2 React isLatest v3 React + TS updates updates v1, v2 retained as history
Figure 2. A versioned chain: each new fact links to its parent, only the latest is flagged current, and superseded versions stay queryable as history.

Supermemory also types the relationship between versions instead of treating every change as a plain overwrite. Updates means the new fact contradicts and replaces the old (Alex moved from Google to Stripe). Extends means it enriches without replacing, so both stay valid (Alex leads a team of five). Derives means it was inferred from a pattern rather than stated. Distinguishing "replaces" from "enriches" is the difference between losing a fact and keeping one, the judgment a naive overwrite gets wrong.

#A graph is just another store to keep in sync

It's tempting to conclude that the answer to all of this is a graph database. A builder on r/LocalLLaMA pushed back on that directly: "Graph doesn't fix this, it just adds a second store to keep in sync." Their working alternative was typed structured records with explicit fields (entity, attribute, value, timestamp), so that an update is an actual update rather than a new chunk appended on top of an old one, with relationships living as reference fields in the same store. Graph traversal is a real capability, they argued, but you need it far less often than the hype suggests.

The deeper point is not anti-graph. What makes updates work is structure, not the choice of database. A fact stored as a flat blob of text can only be appended to; a fact stored as a typed record with an identity can be changed in place. mem0's entity collection and supermemory's versioned rows are both structure over blobs, and that structure is what lets either system tell a superseded fact from a current one.

Resolving a conflict is only half the lifecycle. Deciding when a fact should fade out entirely is the subject of forgetting, and keeping a human in the loop while a curator LLM reconciles contradictions is covered in memory suggestions.

#References

  • mem0, "OSS v2 to v3 migration" and repository, github.com/mem0ai/mem0 (the two-phase ADD/UPDATE/DELETE design, the single-pass ADD-only rewrite, the LoCoMo and LongMemEval gains, and "the model spends its capacity on understanding the input rather than diffing against existing state").
  • Rasmussen et al., "Zep: A Temporal Knowledge Graph Architecture for Agent Memory," arXiv 2501.13956, and Graphiti, github.com/getzep/graphiti (bi-temporal edges, invalidate-not-delete, MinHash/LSH entity dedup with an LLM fallback, LLM-judges-contradiction while code-does-interval-math).
  • Supermemory, "Graph Memory" concept docs and repository, github.com/supermemoryai/supermemory (versioned memory chains via parentMemoryId/rootMemoryId/isLatest and the updates/extends/derives relationship types).
  • r/LocalLLaMA, "Are vector databases fundamentally insufficient for long-term LLM memory?" (the "graph just adds a second store to keep in sync" argument and the case for typed structured records, reported as practitioner experience).