III — Architectures · 8 min

Temporal Memory

How memory systems represent time and supersession so they can answer what is true now and what was true then, using bi-temporal facts that invalidate instead of delete.

You tell an assistant in 2022 that Alice moved to San Francisco. Two years later you mention Alice now works at Stripe. Ask a plain vector store where Alice lives and what she does, and both facts come back, ranked only by how closely they embed against your question. Nothing in the store knows the second fact might update the first, or that "moved to SF" could itself be stale by now. The store can append. It cannot supersede.

The same shape shows up in messier forms. A contract starts as one agreement, gets an addendum, then a clause is modified and later replaced. A support ticket is open, then resolved. In each case the current state is the product of an ordered sequence of changes, and a flat embedding store collapses that sequence into a bag of similar-looking text. Retrieval then picks whichever chunk embeds closest to the query, frequently the oldest one, stated most confidently.

#Why a vector store can't tell time

The instinct is to blame retrieval and reach for a better embedding model or a reranker. That misreads the problem. As one builder put it on r/LocalLLaMA, your actual problem isn't retrieval, it's that facts are stored as flat text with no update mechanism. You can't update Alice's location, you can only append new chunks on top of old ones. Time, if it exists at all, exists as insertion recency, and recency is not the same thing as when a fact was true. A fact recorded yesterday can describe an event from five years ago, and a fact recorded years back can still be the current truth.

What you need is a place to record two things the vector store throws away: when a fact became true, and when it stopped being true. That is the entire idea behind temporal memory.

#Two clocks, not one

The most developed answer in open source is Zep's engine, Graphiti, which models memory as a graph of entity nodes joined by fact-edges (see knowledge graphs for that structure in general). Each fact-edge does not carry a single timestamp. It carries four, on two independent axes. This is a bi-temporal model, an idea borrowed from temporal databases, where the two axes have standard names:

Event time (also "valid time"): valid_at is when the fact became true in the world, invalid_at is when it stopped being true.
System time (also "transaction time"): created_at is when the database learned the fact, expired_at is when the system marked it superseded.

Keeping the two clocks separate is what lets the memory answer two genuinely different questions: what is true now (filter on event time around the present) and what did we believe was true last Tuesday (filter on system time as of that date). It also handles out-of-order ingestion gracefully. A historical fact that arrives late, after a more recent one was already stored, can be born already expired, because the code can see that the newer fact's event time is later.

Figure 1. A fact carries two clocks. A contradiction invalidates the old edge (sets invalid_at and expired_at) rather than deleting it, so both 'true now' and 'believed then' stay answerable.

#Invalidate, don't delete

When a new fact contradicts an old one, Graphiti does not overwrite or remove the old edge. It sets the old edge's invalid_at to the new fact's valid_at and stamps expired_at with the current time. The stale fact stays in the graph, now bounded by an end date. "Alice works in San Francisco" is not erased when "Alice works at Stripe" arrives; it is closed off, with its validity window ending where the new fact's begins.

The reason to keep the old edge is everything you lose by deleting it: time-travel queries, an audit trail, and the ability to undo a contradiction that turns out to be the mistake. Invalidation is the difference between a memory that has history and one that only has a snapshot. Forgetting covers the related case of pruning facts you genuinely want gone.

#Let the model judge meaning, let code do the math

The detail worth copying from Graphiti is how it splits the work. Deciding whether two facts contradict each other is a language judgement, so an LLM does it. Working out which fact came first and what timestamps to set is arithmetic, so deterministic code does it. The two are never mixed.

The LLM sees the new fact alongside existing facts between the same entities and a set of semantically related candidates, and returns which are true duplicates and which are contradictions. The prompt is careful about the distinction. "Alice works at Acme as a software engineer" against a new "Alice works at Acme as a senior engineer" is a contradiction, not a duplicate, because the title changed. "Bob ran 5 miles on Tuesday" against "Bob ran 3 miles on Wednesday" is neither: different events on different days. The instruction is explicit that facts with different numbers, dates, or qualifiers must never be collapsed into duplicates.

Once the LLM has flagged a contradiction, code takes over. If an existing edge became valid before the new edge's valid_at, the old edge is invalidated at exactly that point, and pairs whose validity windows do not overlap are skipped so unrelated facts are never spuriously invalidated. None of this asks the model to compare dates. A separate lightweight call does pull the timestamps out of natural language in the first place, resolving relative expressions like "since last March" against the source message's own date rather than today (the observation-time grounding discussed in smart extraction). But once a date exists as a value, the interval logic is pure code.

#The lighter alternative

A full temporal graph is not the only way to get updatable, time-aware facts, and for many products it is more machinery than the job needs. The community's pragmatic counter-proposal is typed structured records: store each fact as an explicit (entity, attribute, value, timestamp) row instead of free text. Now an update is an actual update. You change Alice's location field and keep the prior value in history, rather than appending a new chunk and hoping retrieval prefers it. Relationships live as reference fields in the same store, and you reach for graph traversal only when you genuinely need multi-hop reasoning, which is less often than it seems.

Supermemory sits in the middle of this spectrum. Rather than a graph, it keeps memories as versioned chains: each memory points to its parent and root, and an isLatest flag marks the current version, so "return the current fact, preserve the history" comes almost for free. The fork between resolving conflicts at write time versus read time, and the versioning patterns that support it, are covered in updates and conflicts.

#The honest weakness

Temporal modelling is not a free win, and the benchmarks say so plainly. On LongMemEval, a multi-session test with long histories, Zep reported large gains on exactly the questions it is built for: roughly +38% on temporal-reasoning and +30% on multi-session questions, with overall accuracy up about 18.5% over a long-context baseline using GPT-4o, and around 90% lower retrieval latency (it reads a pre-built graph instead of re-summarising a 115k-token history at query time). But on single-session-assistant questions, where the answer sits in one recent conversation, the same system scored about 17.7% lower than the baseline. The structure that helps you reason across time gets in the way when there is no time to reason about.

The other cost is on the write path. Building these facts takes several LLM calls per ingested message (extract entities, resolve them, extract edges, detect contradictions, extract timestamps), heavy work that belongs off the critical path. Temporal memory earns its complexity when facts genuinely evolve and old states stay relevant. If your facts rarely change, or only the latest one ever matters, a simpler store will be faster, cheaper, and often more accurate.

#References

Rasmussen et al., "Zep: A Temporal Knowledge Graph Architecture for Agent Memory," arXiv:2501.13956 (the bi-temporal model, invalidate-not-delete, and the LongMemEval and DMR results, including the single-session weakness).
Graphiti, the open-source temporal knowledge-graph engine behind Zep, github.com/getzep/graphiti (fact-edges with valid_at/invalid_at/created_at/expired_at, LLM contradiction detection plus deterministic interval arithmetic, and observation-time timestamp extraction).
r/LocalLLaMA, "Are vector databases fundamentally insufficient for long-term LLM memory?" (the Alice-to-Stripe example, "flat text with no update mechanism," and typed structured records as the lighter alternative, reported as practitioner experience).
Supermemory graph-memory and versioning docs, github.com/supermemoryai/supermemory (versioned memory chains with parentMemoryId/rootMemoryId/isLatest).