Your agent stores beliefs, not decisions

Why CLAUDE.md, raw integrations, and memory.md all fail the same test, and the seven properties a product context layer must have.

An AI agent on your team just re-proposed the architecture you killed three weeks ago, with a clean argument for why it is right. You could blame the model. You could write a sharper rule in CLAUDE.md. You could connect the agent to Slack so it finally sees the thread where you settled this. Or, the 2026 reflex, you could trust that the agent's own memory.md will catch it next time. Hold that last one, because it is not merely the weakest fix. It is a circular one, and seeing why it is circular hands you the whole answer.

The agent did nothing irrational. It read the code, inferred the most probable intent, and acted on it, which is its job. The decision it broke was never in anything it could read. So letting the agent remember for you walks into a trap: the thing doing the remembering is the same faculty that just got it wrong. You would be curing the agent's unreliable grasp of your decisions by persisting the agent's unreliable grasp of your decisions. The way out is not a better file, a wider integration, or a smarter memory. It is a short list of properties any store of decisions must have, forced by what a decision actually is. Once you hold the list the argument ends, because every shortcut on the table is missing a named item on it.

What the agent can possibly know

An agent generates from two inputs and no others: its weights and the tokens in its context window. A tool call is not a third channel; whatever a tool returns must enter the window to matter, and a fetch only returns what some store already holds, so retrieving at runtime does not create the decision, it relocates the question to the quality of what was queried. This sets a hard floor: a fact present in neither the weights nor the context cannot affect the output, at any model size. It is not a capacity problem either. A larger context window, or a perfect infinite one, would not dissolve it, and neither would baking your decisions into the weights by fine-tuning, because, as we are about to see, both still lack what a store of decisions needs. The 2026 work on context rot only sharpens the point: frontier million-token models degrade well before the window fills, and low-relevance text drags down the part that matters. So the decision has to arrive through the window, per task, from an external store. That much is agreed. The fight is over what the store may be, and that is settled by what it has to hold.

What a decision is, and the properties that follow

A decision is not its conclusion. "The ledger uses eventual consistency" is a sentence. It becomes a decision only when someone with the standing to choose actually chose it, weighing a rationale ("we will not pay two-phase-commit latency on the hot path; a reconciliation job tolerates five minutes of staleness"), over a scope, with a status, superseding what it replaced. Strip those away and you have a claim, a guess in a conclusion's clothes. If you suspect I defined the word to win the argument, test it against the first paragraph. If a decision were merely any sentence the model finds plausible, the agent did nothing wrong, and you can close the tab. That it grated means you already hold the view that a decision is an authorized choice, not a guess. I am not smuggling the premise; I am naming the one you walked in with.

From it, the properties of any decision store are forced, not invented. Five are ordinary database properties: selective retrieval, because most decisions are local and you must fetch only the few that bind the file in front of you; a schema, because you query by scope, filter by status, and follow supersession; concurrency, because a team makes decisions in parallel; durable, append-only history, because supersession is a relation over time; and integrity constraints, because a new decision that contradicts a standing one must be caught, not silently stacked. Two more decide the argument, and both fall straight out of authority: provenance, a record of which authoritative act gave an entry its force, no larger than which person on which date; and admission control, a gate an entry passes before it counts as in force. A store with these is what "product context layer" names. "Do I need one" reduces to "do I need those," and each was just shown to follow from what a decision is.

Why every shortcut fails on a named property

A hand-maintained CLAUDE.md has a gate, you, and no schema, no concurrency, no supersession, no constraints. It is a flat file loaded whole every turn, and keeping it current is manual synthesis whose cost climbs with the throughput you adopted agents to gain. If you are one author on a young codebase, it may be all you need; the failure grows with every added author and every month of accumulated history. It fails on five of the seven.

A fully connected agent reading the raw sources has access but does no synthesis, so it re-derives each decision live, against ground truth that sits in no single message. Two agents reading the same contradictory logs reach different answers, and raw read access to chat and mail hands the agent, and anything injected into a ticket, far more than the decision it needed. It fails on provenance, on constraints, and on determinism.

Then the one reached for in 2026, the reason this post exists. Unsupervised agent memory.md looks like it already owns the five database properties and could earn the other two. It cannot earn them, by the argument already built. Provenance and admission require the authoritative act that made a decision binding, and the writer of an auto-memory is the agent, which holds no such authority. Delegating it to execute within your decisions never appointed it to originate them, which is exactly why its overruling you is a grievance and not a mandate. So every auto-written entry is, at best, the model's belief that a decision was made. The belief is often correct, which is the trap, not the defense: without provenance you cannot tell which entries are wrong, and the wrong ones gather on precisely the non-obvious calls you most needed to keep, the killed design among them. Corrections do not rescue it; with no supersession and no constraints they accumulate beside the errors instead of replacing them, so the store sediments rather than converging. That is the circularity made concrete: it persists the very faculty whose unreliability is the problem. The only repair is to inject an authority the model cannot supply, a person who ratifies the entry, or capture from a recorded authoritative act, with conflicts resolved on the way in. Do that and you have not dodged the seven properties. You have built them. A memory whose entries a human approves is not the counterexample to this post. It is this post.

What this proves, and what it does not

It does not prove you must buy anything. It proves the properties are necessary, because each is forced by what a decision is and how a model reads. So build them yourself. A folder of ADRs in git, a template for the schema, pull-request review as your admission gate and your provenance, a check in CI that flags a record contradicting a standing one: that is a product context layer, built by hand, and for some teams it is the right one. The seven properties never said Postgres; they said structure, governance, and authority. What a product removes is the part that does not fit in a file: synthesizing one clean decision out of a noisy thread, at the moment it is made, and supplying the discipline the velocity treadmill erodes first. We built Brief to be that store, for that reason. The summary is older than all of this. You do not keep a growing, multi-writer, queried dataset in a text file; you do not answer a hot query by scanning the raw source every time; and you do not let an unauthorized writer mint records in your system of record. Your agents have been doing all three, on the one dataset that decides whether they build the thing you actually chose.

GET TLDR FROM:

Stay in the Loop

Get notified when we publish new insights on building better AI products.

Get Updates

← Back to Blog