Context engineering, explained

The discipline of deciding what a model sees, and the harder problem of keeping it true.

Context engineering is the practice of deciding what information an AI model sees before it responds: not just the prompt, but the system instructions, memory, retrieved documents, tool definitions, and conversation history that fill its context window. It treats everything the model reads as a single, deliberately assembled input rather than a one-off question.

The term moved from niche to standard in late 2025. Anthropic's Applied AI team formalized it in September 2025, defining context engineering as the set of strategies for curating and maintaining the optimal set of tokens during a model's inference. Around the same time, Gartner put it more bluntly, framing 2025 as the year context engineering came in and prompt engineering went out, a shift the broader field quickly codified. By April 2026 it was being described across the industry as the defining skill for anyone building with AI agents.

How it differs from prompt engineering

Prompt engineering asks one question: how do I phrase this request to get a good answer? It operates at the single-turn level and works well for straightforward, one-shot interactions.

Context engineering asks a broader one: what is the complete set of information most likely to produce the behavior I want, across many steps and a long task? Anthropic frames it as the natural progression of prompt engineering, not a replacement. Prompt engineering is one component inside it. The prompt still matters; it is now one ingredient in a larger information environment that also includes memory, tools, and retrieved data.

The reason the field shifted is the shift in what people build. A chatbot answers one question and forgets it. An agent runs for fifteen or twenty steps, calls tools, and has to stay coherent the whole way. At that scale, the phrasing of any single instruction matters less than the design of the whole environment the agent is reasoning inside.

Why the context window is a budget

A model's context window is finite, and it does not behave like neutral storage. Performance degrades as the window fills. Researchers call this context rot: every frontier model gets less reliable as more tokens pile up, regardless of how relevant those tokens are. An agent that re-reads the same long file three times in one session is not just wasting money on tokens. It is crowding out the signal it actually needs.

This is why practitioners treat the window as a budget to spend, not a container to fill. The goal is the smallest high-signal set of tokens that lets the model do the job. More context is not better context. The discipline is curation, and the enemy is noise.

The core operations

Most working definitions break context engineering into a handful of repeatable moves:

Offloading. Move information out of the prompt and into external systems (files, databases, APIs) the agent can reach when it needs them.
Reduction. Compress or summarize older information so the window does not fill with stale history.
Retrieval. Pull in the right documents or data at the moment they are relevant, rather than preloading everything up front.
Just-in-time loading. Store lightweight identifiers and fetch the full data only when the task calls for it. Claude Code works this way, loading only what a given step needs to keep the window lean.

The throughline is timing. Good context engineering is less about having the information and more about getting the right slice of it into the window at the right moment.

A concrete example

Ask an agent to "write a quarterly business review for Q1 2026." With prompt engineering alone, you get a generic template with placeholder numbers, because the model has no idea what your business is.

With context engineering, the same request has a system prompt defining your team's report format, tools to query your CRM and pull the actual revenue figures, memory of the last review and the feedback your CEO gave on it, and the awareness that it is April 2026 writing about the quarter that just closed. Same model, same prompt. The difference in output is entirely the difference in context.

Where it gets hard

The hard part of context engineering is not assembling context once. It is keeping it correct as things change.

Anthropic's own guidance points at the central tension: instructions need the right altitude. Too specific, and you hardcode brittle logic that breaks the moment reality deviates from your if-else rules. Too vague, and the model has no real signal and falls back on guesses. The balance, specific enough to guide and loose enough to leave room for judgment, is genuinely difficult to strike, and it shifts as the task shifts.

There is also a deeper problem that tooling alone does not solve. Most of the information an agent needs to act well is institutional: the decisions a team has already made, the constraints it is operating under, the reasons behind past choices. That knowledge is scattered across documents, chat threads, and people's heads, and it goes stale the moment a decision changes. You can engineer the perfect retrieval pipeline and still feed the agent context that was true last month and is wrong today. The mechanics of context engineering assume the underlying context is current. Keeping it current is its own problem.

Where Brief fits

Brief is adjacent to this. Context engineering is the discipline of getting the right information into a model's window; Brief works on the layer underneath, keeping the institutional context (decisions, customer research, work in flight) accurate and current so that whatever you feed an agent reflects what is actually true today. The engineering decides what the model sees. Brief helps make sure what it sees is right.

Stay in the Loop

Get notified when we publish new insights on building better AI products.

Get Updates

← Back to Blog