III — Architectures · 8 min

Memory Without Tool Calling

The big assistants support third-party tool calling now, but every connector is off until you add and enable it, so injecting memory into the prompt is the one path that works in a default chat with nothing wired up.

Most writing about agent memory quietly assumes you control the agent: you register a tool, the model calls it, you hand back context. Clean. And on the big assistants that route increasingly exists. Claude speaks MCP, ChatGPT has Custom GPTs with Actions and, more recently, connectors, and the list of clients that can call a registered tool grows every month. When that path is open, take it. It is the clean one, and a memory product should use it wherever it can.

The catch is in the word "register". A third-party tool is opt-in and off by default: someone has to add the connector, link the account over OAuth, and switch it on, and a fresh chat with nothing set up calls no third-party tools at all. (The assistant's own first-party features still run on their own, web search, image generation, code execution, and Gemini will even reach Google's own services unprompted. That is the platform's tooling, not the memory you wired up.) So you can never count on a given chat having your memory connected. The moment it is not, a tool-only memory is simply gone, and your only channel left is the prompt.

So you are left with one move, and it is a little bit grubby: put the memory into the conversation yourself, before the model reads the turn. A browser extension watches the chat box, and the moment the user sends a message it fetches the relevant memories and slips them into the prompt. The model has no idea a third party was involved. It just sees text, some of which the user "typed". That is the whole trick, and the rest of this page is everything that goes wrong with it.

#Pre-send interception

The mechanism is mundane. The extension hooks the send path (the button, the Enter key), pauses it, calls out for memories relevant to what was just typed, prepends or appends them, and lets the send continue. A small "fetching context" flicker, then the message goes with your history stapled to it. It should never block the send for long, it should fail open (no memory beats a hung chat box), and it needs an escape hatch for the times you want a clean turn (a keyword, a modifier key) because sometimes you really do not want last month dragged in.

Done well it is invisible. Done badly it is the most annoying thing in your browser. The entire UX budget is latency and restraint.

#The injection that reads as an attack

Here is the part nobody warns you about. The obvious way to inject memory is a tidy machine-readable control string, something like a directive that says "load these memories, then answer". It feels clean. It is also, to a modern model, indistinguishable from an attack.

These models have been trained hard to resist prompt injection. So they have opinions about text that shows up in the user turn carrying instructions. A control string with shell-flavoured operators reads like command injection. System-looking XML tags read like someone forging a system message, and Claude in particular is told to be suspicious of tags in the user turn claiming to come from the system. Third-person, authority-flavoured framing ("The user has enabled the following plugin. Take note of the following for the remainder of the conversation.") is textbook injection cadence. Put all three together and the model does exactly what it was trained to do: it refuses. You watch your own memory feature get rejected as an unauthorised attempt to modify the model's instructions.

The fix is to stop sounding like an attacker. Make the injected memory look like what it actually is, a note, not a command:

Annotation-shaped, not directive-shaped. A bracketed aside, a pin glyph, an HTML comment that does not even render. Something that reads as a margin note rather than a system override.
First person, the user's voice. "I work at MemoryPlugin and prefer TypeScript" is the user talking. "The user works at MemoryPlugin" is a third party issuing instructions about the user, which is precisely the shape the model is watching for.
Drop the ceremony. No "for the remainder of this conversation", no "you must", no fake authority. The more it postures as a system instruction, the more it gets treated as a forged one.
Use a real tool call wherever you can. This is the deeper point. On a platform that supports MCP or function calling, route the write through the tool, not the text. A tool call is structurally trusted in a way inline text never will be. Reserve the smuggling for the platforms that leave you no choice.

The lesson generalises past memory: any time you inject text into a safety-trained model on the user's behalf, you are writing on an adversarial surface. Design the payload so it does not pattern-match to the thing the model was taught to refuse. Privacy and pitfalls treats this as the security concern it is.

#Garbage gets you rejected too

The second failure mode is quieter and, honestly, more embarrassing. Inject a clean, relevant memory and the model uses it. Inject a pile of duplicates, empty fragments, and "user said hi" noise, and the model does something revealing: it reads the clutter, decides your memory is junk, and uses that as a reason to ignore the whole thing. The quality of what you inject is not just a tokens problem. It is the difference between the protocol working and the model opting out of it.

Which means hygiene is a delivery feature, not housekeeping. Dedupe before you inject. Drop the fragments and the greetings. Cap the list to a handful of strong, specific memories rather than everything you have. The memory suggestions curator and quality-at-save-time exist partly for this: a tidy store is one the model will actually trust at the moment of recall.

#What it costs

None of this is free, and it is worth being blunt about the bill.

The DOM is not your friend. You are hooked into someone else's web app. A redesign, a renamed class, a new send flow, and your injection silently stops firing until you patch it. Every supported platform is a small ongoing maintenance tax.
Placement is a guess. You can put memory into the prompt, but you cannot control where the model attends. Bury it and lost in the middle does its damage. The same retrieval and ranking discipline from hybrid retrieval matters more here, not less, because you get one shot at the context and no reranking afterwards.
It is opt-in by nature. No tool call means no clean signal that memory was wanted. You lean on the user activating it, which is friction, and on the model choosing to use what you injected, which is not guaranteed.

#The spectrum of access

Step back and it is really a ladder, from most control to least:

You own the agent. Register tools, call memory directly. Total control, smallest reach.
MCP / function calling on someone else's client. Structured, trusted, but only where the client supports it.
Browser-extension injection. Works in any chat, including one with nothing connected, fragile and adversarial, but it needs no setup from the user.
Manual copy-paste. The floor. Always works, scales to nobody.

The interesting tension is that reach and control point in opposite directions. The cleanest integration is also the one that has to be wired up first. The messiest one, smuggling text past a model trained to distrust smuggled text, is the only thing that works in a chat where nothing has been connected. If you want memory that is there whether or not the user has set up a tool, you do not get to skip the grubby tier. You get to do it well.

#References

The pattern here is drawn from building cross-platform memory injection for MemoryPlugin; see how MemoryPlugin works and the portability argument in across tools.
On injected text as an adversarial surface and model prompt-injection defences: privacy and pitfalls; Anthropic's guidance that models should distrust system-claiming tags in the user turn.
Related mechanics elsewhere in this guide: memory suggestions (hygiene before injection), hybrid retrieval (you get one shot at the context), and context vs memory (placement and lost-in-the-middle).