Reference · 4 min

References

The full bibliography behind this guide: the papers, open-source projects, and articles that ground every number and design claim on these pages.

Every benchmark number, threshold, and design decision in this guide traces back to one of the sources below. The papers establish the ideas, the open-source projects show how those ideas survive contact with production, and the articles and docs fill the gaps that academic papers leave out (cost, latency, failure modes). Each entry notes what it established and links to the wiki page where it does the most work, so you can read forward from a citation or backward from a claim.

When a figure on this wiki comes from one production system rather than a paper, it is framed as such in context (for example, an embedding dimension or a model-routing choice). Community and forum numbers are reported as claims, not measured facts. The entries here are the primary, verifiable sources.

#Papers

#Foundations: retrieval and context

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. The origin of parametric vs non-parametric memory and the embed, store, retrieve, generate loop; grounds RAG. https://arxiv.org/abs/2005.11401
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi. "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." ICLR 2024. Retrieve conditionally, then verify grounding; grounds RAG. https://arxiv.org/abs/2310.11511
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang. "Lost in the Middle: How Language Models Use Long Contexts." TACL 2024. The U-shaped curve showing that inclusion is not usage, so the best context belongs at the edges of the prompt; grounds context vs memory and hybrid retrieval. https://arxiv.org/abs/2307.03172 (code: https://github.com/nelson-liu/lost-in-the-middle)

#Agent memory architectures

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez. "MemGPT: Towards LLMs as Operating Systems." 2023. The OS analogy: context as RAM, external stores as disk, the model paging its own memory; grounds hierarchical memory. https://arxiv.org/abs/2310.08560
Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein. "Generative Agents: Interactive Simulacra of Human Behavior." UIST 2023. The memory stream, plus recency, importance, and relevance ranking, with LLM-rated importance and reflection as consolidation; grounds smart extraction and forgetting. https://arxiv.org/abs/2304.03442
Wujiang Xu, et al. "A-Mem: Agentic Memory for LLM Agents." NeurIPS 2025. A Zettelkasten-style self-organising note network where adding a note can rewrite older ones; grounds knowledge graphs and updates and conflicts. https://arxiv.org/abs/2502.12110
Mem0: scalable long-term memory for production AI agents. 2025. The extract-then-decide CRUD framing (ADD, UPDATE, DELETE, NOOP) that the project later reversed; grounds smart extraction and updates and conflicts. https://arxiv.org/abs/2504.19413

#Graph and hierarchical retrieval

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Jonathan Larson. "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." 2024. Entity graph plus Leiden community summaries for global sensemaking, at a heavy indexing cost; grounds knowledge graphs. https://arxiv.org/abs/2404.16130
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su. "HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models." NeurIPS 2024. Personalized PageRank over a triple graph does multi-hop in one step, 10 to 30 times cheaper than iterative retrieval; grounds knowledge graphs. https://arxiv.org/abs/2405.14831
Bernal Jiménez Gutiérrez, et al. "From RAG to Memory: Non-Parametric Continual Learning for LLMs" (HippoRAG 2). 2025. Adds passage nodes and a recognition-memory filter, and supplies the measured cost and quality comparison across GraphRAG, RAPTOR, and LightRAG. https://arxiv.org/abs/2502.14802
Parth Sarthi, et al. "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval." ICLR 2024. Recursive embed, cluster, and summarise into a tree you can query at any level; grounds hierarchical memory. https://arxiv.org/abs/2401.18059
LightRAG. 2024. Dual-level keyword plus graph retrieval with incremental updates, the answer to "must re-index the world"; grounds landscape. https://arxiv.org/abs/2410.05779
Zep. "A Temporal Knowledge Graph Architecture for Agent Memory." 2025. The bi-temporal model (event time and system time) where contradictions invalidate rather than delete; grounds temporal memory. https://arxiv.org/abs/2501.13956

#Surveys and benchmarks

Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen. "A Survey on the Memory Mechanism of LLM-based Agents." 2024. The operation-centric view (write, consolidate, read); grounds what is AI memory. https://arxiv.org/abs/2404.13501 (repo: https://github.com/nuster1128/LLM_Agent_Memory_Survey)
Yaxiong Wu, Sheng Liang, Chen Zhang, Yichao Wang, Yongyue Zhang, Huifeng Guo, Ruiming Tang, Yong Liu. "From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs." 2025. The Object, Form, and Time taxonomy bridging human and AI memory; grounds what is AI memory. https://arxiv.org/abs/2504.15965
Di Wu, et al. "LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory." ICLR 2025. 500 questions across five long-term-memory abilities; commercial assistants degrade roughly 30% over sustained interaction; grounds evaluating. https://arxiv.org/abs/2410.10813

#Open-source projects

mem0 (https://github.com/mem0ai/mem0). A widely used memory layer for agents; the codebase documents the shift from two-phase write-time consolidation to single-pass append-only with read-time ranking. See updates and conflicts.
supermemory (https://github.com/supermemoryai/supermemory). Memory API plus transparent proxy and MCP server, with versioned memory chains and forgetting as a stored property. Its AST-aware code chunker is separate (https://github.com/supermemoryai/code-chunk). See across tools and forgetting.
Letta, formerly MemGPT (https://github.com/letta-ai/letta). Self-editing tiered memory with memory blocks, character budgets the model can see, and sleep-time consolidation. See hierarchical memory.
Graphiti (https://github.com/getzep/graphiti). The open-source temporal knowledge-graph engine behind Zep: bi-temporal edges, MinHash/LSH dedup before LLM fallback, and hybrid search with RRF, MMR, and node-distance rerankers. See temporal memory and hybrid retrieval.
Microsoft GraphRAG (https://github.com/microsoft/graphrag). The reference implementation of community-summary graph RAG with Local, Global, and DRIFT search. See knowledge graphs.
HippoRAG (https://github.com/OSU-NLP-Group/HippoRAG). The PPR-over-triples retriever and its HippoRAG 2 successor. See knowledge graphs.
RAPTOR (https://github.com/parthsarthi03/raptor). Recursive summarisation tree for hierarchical document QA. See hierarchical memory.
LightRAG (https://github.com/HKUDS/LightRAG). Incremental, dual-level graph RAG. See landscape.
Cognee (https://github.com/topoteretes/cognee). An Extract, Cognify, Load pipeline with ontology grounding to constrain graph edges. See knowledge graphs.
LangMem (https://github.com/langchain-ai/langmem, docs https://langchain-ai.github.io/langmem/). Memory primitives, plus the semantic, episodic, and procedural taxonomy this guide adopts. See what is AI memory.
A-MEM (https://github.com/agiresearch/A-mem). The self-evolving Zettelkasten note network. See updates and conflicts.

#Articles, blogs, and documentation

Anthropic. "Introducing Contextual Retrieval." 2024. The most actionable, numbers-backed retrieval recipe: prepend chunk-specific context before embedding, add BM25, add a reranker, cutting top-20 failures from 5.7% to 1.9%; plus the under-200K-token rule for when to skip RAG entirely; grounds chunking and hybrid retrieval. https://www.anthropic.com/news/contextual-retrieval (cookbook: https://platform.claude.com/cookbook/capabilities-contextual-embeddings-guide)
Microsoft Research. "GraphRAG: Unlocking LLM discovery on narrative private data." 2024. The accessible companion to the GraphRAG paper. https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Zep. "Zep Is The New State of the Art In Agent Memory." The vendor framing of the DMR and LongMemEval results, including the honest single-session weakness; read alongside evaluating. https://blog.getzep.com/state-of-the-art-agent-memory/
mem0 documentation, OSS v2 to v3 migration notes. The authoritative narrative of moving conflict resolution from write time to read time, with the reported LoCoMo and LongMemEval gains; grounds updates and conflicts. https://docs.mem0.ai
supermemory documentation, "Memory is not RAG" and the MemScore benchmark. The Adidas-to-Puma illustration and the accuracy, latency, and context-tokens reporting triple; grounds RAG vs memory and evaluating. https://docs.supermemory.ai
Model Context Protocol. The integration substrate for exposing memory as tools any client can call; grounds across tools. https://modelcontextprotocol.io