AI Memory
Reference · 5 min

Glossary

Plain definitions of the AI-memory terms used across this guide, from embeddings and RAG to BM25, knowledge graphs, and bi-temporal models.

Quick definitions for the terms that recur across this guide. Each one links to the page where it does real work, so you can jump from "what is this" to "how do I use it." Full citations live on the references page.

#A

ANN (Approximate Nearest Neighbor). Search that finds the vectors closest to a query without comparing against every stored vector, trading a small amount of exactness for a large speed gain. It is what makes vector search practical past a few thousand items; the usual implementation is a graph index like HNSW.

#B

Bi-temporal. A data model where every fact carries two independent timelines: when it was true in the world (event time, valid_at / invalid_at) and when the system learned it (transaction time, created_at / expired_at). Zep and Graphiti use this so memory can answer both "what is true now" and "what did we believe last Tuesday." See temporal memory.

BM25. A lexical ranking function that scores text by query-term overlap, weighting rarer terms more heavily (Okapi BM25). It catches exact strings (error codes like TS-999, proper names, IDs) that dense embeddings blur, which is why hybrid search pairs it with vector retrieval.

#C

Chunking. Splitting source text into retrievable units before embedding. Size, overlap, and where the boundaries fall (fixed-length, semantic, or AST-aware for code) quietly decide retrieval quality more than the model choice does. See chunking.

Contextual retrieval. An Anthropic technique that prepends a short, chunk-specific context (usually 50-100 tokens) to each chunk before embedding and BM25 indexing, so a bare "revenue grew 3%" regains its company and quarter. It cut top-20 retrieval failures from 5.7% to 3.7% (contextual embeddings), 2.9% (plus BM25), and 1.9% (plus reranking). See chunking.

Cosine similarity. Measures the angle between two vectors, ignoring their magnitude, as the standard score for vector search. When vectors are pre-normalised to unit length, cosine equals the dot product, a common performance shortcut.

#D

Dense vs sparse vectors. Dense vectors are low-dimensional learned embeddings where every dimension carries part of a distributed meaning; sparse vectors (TF-IDF, BM25) have one dimension per vocabulary term and are mostly zeros. Dense captures meaning, sparse captures exact words, and hybrid retrieval uses both.

#E

Embedding. A model-produced vector that places text (or other data) in a space where nearby points mean similar things, turning "find similar" into geometry. Embeddings have been the basis of retrieval since the original RAG work (Lewis et al., 2020). See embeddings.

Episodic memory. Memory of specific past experiences and events ("what happened in session 3"), stored as a collection of logged episodes. The D&D bot that forgets NPCs from five sessions ago is failing at episodic memory. One of LangMem's three memory types. See what is AI memory.

#H

HNSW (Hierarchical Navigable Small World). The most common ANN graph index: a layered proximity graph you traverse from coarse to fine, reaching a query's neighbors in roughly logarithmic hops instead of a full scan. It is the engine inside most vector databases.

Hybrid search. Running dense (semantic) and sparse (BM25 keyword) retrieval together, then fusing the two ranked lists, usually with RRF. It beats either alone because each catches what the other misses. See hybrid retrieval.

#K

Knowledge graph. A memory structure of nodes (entities) and edges (relations and facts) that supports multi-hop and whole-corpus questions flat vectors cannot answer. The cost is real: building one needs an LLM extraction pass per chunk (Microsoft GraphRAG used roughly 115M input tokens to index an 11,656-passage corpus). See knowledge graphs.

#L

Leiden. A community-detection algorithm that partitions a graph into hierarchical clusters of densely connected nodes. GraphRAG runs Leiden to group entities, then has an LLM summarise each community so it can answer global "what are the themes" questions by map-reduce over those summaries.

#M

Matryoshka embeddings. Embeddings trained so that truncating the vector to fewer dimensions preserves most of its quality, letting you cut storage and compute (for example, a 1024-dim vector down to 512) at a small accuracy cost. See embeddings.

MCP (Model Context Protocol). An open protocol for exposing tools, data, and memory to any LLM client (Claude Desktop, Cursor, and others) through one interface, so a memory store can follow you across tools instead of being locked to a single app. See memory across tools.

MMR (Maximal Marginal Relevance). A reranking criterion that balances relevance to the query against novelty versus what you have already selected, suppressing near-duplicate results so the context window is not filled with the same fact restated. Graphiti offers it as one reranker option. See forgetting.

#P

Personalized PageRank. A graph-walk algorithm that ranks nodes by their connectivity to a set of query-seeded "personalization" nodes. HippoRAG runs it over a knowledge graph to do multi-hop retrieval in a single step, reported at 10-30x cheaper than iterative LLM-in-the-loop retrieval. See knowledge graphs.

Procedural memory. Memory of how to do things: behavioural rules and skills. In LangMem it lives in the system prompt and improves through prompt optimisation rather than being stored as data rows. See what is AI memory.

#R

RAG (Retrieval-Augmented Generation). The pattern from Lewis et al. (2020): embed a corpus, retrieve the top-K most similar passages at query time, and place them in the prompt so the model generates from retrieved, editable (non-parametric) knowledge instead of only its frozen weights. See RAG.

Reranking. A second-pass model, often a cross-encoder, that re-scores an initial candidate set for relevance. It is the cheap last-mile retrieval win: Anthropic's pipeline fetches the top 150 then reranks down to 20, cutting failures from 2.9% to 1.9%. See hybrid retrieval.

RRF (Reciprocal Rank Fusion). A way to merge ranked lists by summing 1/(k + rank) across them, fusing by position rather than raw score. This avoids the trap of linearly combining incomparable scales, like cosine (0 to 1) against BM25 (often 5 to 30). See hybrid retrieval.

#S

Semantic memory. Memory of general facts and knowledge about a user or the world ("Alice works at Stripe"), independent of when you learned it. In LangMem it takes either a profile shape (one continuously updated document) or a collection shape (many searchable facts). See what is AI memory.

#V

Vector database. A store that indexes embeddings for fast similarity search (via ANN/HNSW) plus metadata filtering. It excels at "find similar" but flattens time and state: an outdated fact and its correction look equally relevant to a query, which is why memory systems layer conflict resolution and temporal models on top. See RAG vs memory.

#W

Working memory. The information currently inside the model's context window: fast, volatile, and bounded, the RAM to long-term memory's disk. It resets every session, and that gap is what a persistent memory layer exists to fill. See context vs memory.

#References