IV — Building it for real · 8 min

Privacy, Security, and the Pitfalls of Memory

A memory layer is a database of a user's most personal context that gets injected into a model automatically, which makes it a security surface: store-vs-count, the encryption tradeoff, prompt-injection-shaped delivery, garbage-driven refusals, and one-filter tenant isolation.

A memory layer holds the most personal context a user has ever handed a model, and it sits there ready to be retrieved and injected into the next conversation with nobody watching. Every property that makes it useful is also a liability. It persists, so data you meant to discard sticks around. It follows the user across tools, so a leak follows them too. It gets injected automatically, so a poisoned or malformed entry reaches the model unsupervised. None of this argues against building memory. It argues for treating the memory layer as a security surface from the first commit, not after the first incident.

#Separate what you store from what you count

The most common promise a memory product makes is a negative one: we do not keep your X. That sentence is only worth something if the system can prove it, and proving it means tracking two different numbers. MemoryPlugin counts an export total (every tokenizable thing in an imported conversation: message text, artifacts, model thinking traces, tool payloads, attachments) separately from a counted-text total, which is only the text it actually embeds and stores. For Claude imports, thinking is excluded and artifacts are redacted. For ChatGPT, tool and system messages are dropped, leaving the human and assistant turns. The two numbers are supposed to diverge.

The cheap version of this is to embed the whole blob and decide what to exclude later. By the time "later" arrives, the reasoning traces are already vectors in your index and the promise is already false. When the exclusion is a property of the pipeline (a column you can point at, a measurable gap between two counts) the claim becomes auditable instead of aspirational. Store-vs-count is the difference between a privacy policy you wrote and one you can demonstrate.

#The encryption tradeoff, said plainly

A persistent server-side store of someone's private context is a liability that an ephemeral chat never was. The clean answer is end-to-end encryption, and it collides with nearly everything a memory system does. You cannot embed, rerank, dedup, summarise, or run a curator over ciphertext you are not allowed to read. The work that makes memory smart happens on plaintext, on your servers.

Alara has said this in public rather than papering over it: you can end-to-end encrypt users' memories, but it limits you in many ways, and a plug-and-play setup for non-technical users effectively requires storing their data on your servers without E2E. No clever architecture escapes the tradeoff. What you can do is be honest about which side you picked, then earn back trust the other ways: minimise what you store (above), scope it tightly (below), keep an audit trail, and never claim a privacy property the system does not actually have.

#Your delivery format is an attack surface

On ChatGPT, Claude, and Gemini, safety-trained models started refusing to follow the memory protocol outright. The cause was not the content of the memories. It was their shape. A magic-string syntax like to=plugin&&memory=... pattern-matches as shell command injection, because && is a shell operator, and models trained on injection-defence corpora flag it. System-looking XML tags trip a model's rule to distrust tags in the user turn that claim to come from the system or the vendor. Third-person framing ("the user has enabled...") followed by "please take note for the remainder of this conversation" is textbook injection phrasing. The model was behaving correctly. The memory format simply looked like an attack.

The fixes are all about not looking like one. Annotation-shaped syntax ([memory: ...], an emoji-prefixed note, or HTML comments that do not render) reads as data rather than a command. First-person user voice reads as the user's own note rather than a third party issuing instructions. And on clients that support it, routing memory writes through an MCP tool call instead of an inline string sidesteps the problem, because a tool call is a sanctioned channel where a magic string in the user turn is a red flag. This is a quieter reason the case for memory across tools runs through MCP: it is portable, and it is shaped like trust.

It helps to draw where the trust actually changes. A memory system has three boundaries, and each is a place where something untrusted tries to become trusted. At admission, model-suggested or user-typed content tries to become a durable memory. At the tenant edge, one user's store has to stay separate from everyone else's. At injection, a stored memory crosses back into the model's context, where it must be read as data and never as a new instruction.

Figure 1. The three trust boundaries of a memory layer: admission (untrusted content becoming durable memory), tenant isolation (one user's store from another's), and injection (stored memory re-entering the model context as data, not instructions).

#Garbage memories make the model refuse

The other half of that refusal had nothing to do with syntax. The injected list of recent memories contained an entry that was just "...", a one-liner claiming "a cat is better than a bat", and four identical copies of the same greeting. The model read its own injected context, judged it as clutter, and used that as a reason to decline the whole recall behaviour. Low-quality memory is not a cosmetic problem. The model forms an opinion about whether following your protocol is worth it, and junk talks it out of cooperating.

So hygiene is a recall feature, not housekeeping. Score quality at save time and tell the extractor not to store greetings, small talk, and ephemera in the first place. Dedup server-side before injecting, collapse near-duplicates, and drop fragments below a length floor. Cap the injected list at a few strong examples instead of a dozen mixed ones. And run a curator over the store so duplicates and contradictions get cleaned up over time instead of piling up. The same garbage that wastes tokens is the garbage that suppresses recall, which earns its own entry in the failure modes catalogue.

#One filter is the whole boundary

Retrieval usually runs under a service-role database client, because the read job is server-side and has to bypass row-level security to do its work. That convenience removes your safety net. With row-level security out of the path, the explicit user-id filter in the query is the only thing keeping user A's memories out of user B's context. One code path that forgets the filter is a cross-tenant leak, and no test of the happy path will surface it, because the happy path returns the right user's data either way.

Treat the filter as a security control, not a query parameter. Keep row-level security on wherever you can so there is a backstop, filter by user-id explicitly in every path that uses the service-role client, and review those paths the way you would review auth code. Scoping is the same anxiety in the user's hands. A widely upvoted thread documented people finding that ChatGPT could surface memories from chats they had set to "project-only". When a user draws a boundary (this bucket, this project, this scope), the read path has to honour it too, or the control is decorative.

#References

MemoryPlugin engineering notes (described here as one production system, no internal paths): the export-total versus counted-text token split that makes "we do not embed your thinking traces or tool output" auditable; safety-trained models refusing memory protocols whose delivery format read as prompt injection, and the fixes (annotation-shaped syntax, first-person voice, routing writes through an MCP tool); low-quality memories suppressing recall; and the service-role read path where an explicit user-id filter is the only tenant boundary.
OWASP Top 10 for LLM Applications, LLM01: Prompt Injection, genai.owasp.org/llmrisk/llm01-prompt-injection (why injected content that the model can mistake for instructions is a first-class risk class).
Anthropic, guidance on mitigating jailbreaks and prompt injection, docs.anthropic.com (the documented behaviour of treating user-turn tags that claim to be system or vendor as suspicious).
Alara (u/dhamaniasad), comment on the end-to-end-encryption tradeoff for consumer memory, r/ClaudeAI, reddit.com/r/ClaudeAI/comments/1ubsely.
r/ChatGPT, "Despite what OpenAI says, ChatGPT can access memories outside projects set to 'project-only'" (community evidence that scope boundaries must be enforced on the read path, reported as user experience).