Context Engineering: The Skill That Replaced Prompt Engineering
written by Stefan Christoph
- 14 minutes readTL;DR: Patrick Debois coined DevOps in 2009 by naming what practitioners were already doing. In 2026, he’s doing it again with “Context Engineering” and the CDLC (Context Development Lifecycle): Generate, Evaluate, Distribute, Observe. The core insight: as coding agents get more capable, the bottleneck shifts from writing code to assembling the right context. More context isn’t better — more precise context is. Teams that treat context as a versioned, tested, governed engineering artifact will compound an advantage that’s hard to replicate.
The Same Person, Seventeen Years Apart
In 2009, Patrick Debois organized a small conference in Ghent, Belgium. He needed a name, took the first three letters of “development” and “operations,” added “days,” and called it DevOpsDays [1]. The term stuck. It named a discipline that practitioners had been doing without a shared vocabulary. Within a few years, DevOps went from a conference hashtag to a job title, a team structure, and an industry worth billions.
In May 2026, the same Patrick Debois gave a talk at AI Engineer Europe in London called “Context Is the New Code” [2]. The video crossed 60,000 views in ten days. The community translated it into Korean, Japanese, Chinese, and Arabic without being asked. Multiple authors independently rendered his framework in 4, 5, and 7-stage variants. Tech Talks Weekly ranked it the #1 software engineering talk of the week.
Same person. Same pattern. Naming the discipline that emerges from practice.
I’ve been watching this unfold from the practitioner side. For the past year, I’ve been building and operating an AI agent system that manages my daily workflow: meeting prep, research, content creation, customer context [3]. Every problem I’ve solved has been a context problem. Not a prompting problem. Not a model selection problem. A context problem: what does the agent know, when does it know it, and how fresh is that knowledge?
Debois gave it a name. And the name matters, because it shifts how we think about the work.
From Prompting to Engineering
Prompt engineering was the first wave. You learned to phrase questions carefully, add examples, specify output formats. It was a useful skill. For simple, single-purpose agents — a FAQ bot, a code formatter — it may be all you need. But for complex, multi-step, multi-tool agents that operate across sessions and domains, prompting alone is a dead end.
Here’s why: a prompt is a single interaction. Context engineering is the system that assembles everything the agent needs before the prompt even arrives. The prompt is one input. The context is the whole picture.
IBM’s Martin Keen frames it precisely [4]: traditional systems have simple context — who is the user, what resource are they accessing, what permissions do they have. Agentic systems explode this into at least six layers:
| Layer | What It Contains | Example |
|---|---|---|
| Prompt context | The user’s request | “Prepare my meeting with Acme Corp” |
| Situational context | Environment state | Which orchestrator, which agents, which tools are available |
| Resource context | What each tool provides | MCP server capabilities, API schemas, data sources |
| User context | Identity and permissions | Role, location, access level, preferences |
| Model context | LLM characteristics | Capabilities, limitations, tuning |
| Task history | Memory | What was tried before, what worked, what failed |
Prompt engineering addresses layer one. Context engineering addresses all six.
The shift is analogous to what happened in web development. In 2005, you could build a website by writing HTML in a text editor. By 2015, you needed build systems, package managers, CI/CD pipelines, CDNs, and monitoring. The core skill (writing markup) didn’t disappear. It became one input to a much larger system. That’s where we are with prompting.
The CDLC: Context Development Lifecycle
Debois’s core contribution is the CDLC — Context Development Lifecycle [2]. The thesis is plain: code has version control, review, testing, CI/CD, and observability. Context — the prompts, skills, instructions, and knowledge we feed agents — has none of that. Yet.
The lifecycle is four stages in an infinity loop:
The Context Development Lifecycle — four stages in an infinity loop
Generate — Create the context artifacts: steering files, skill definitions, knowledge bases, tool descriptions. This is where most teams start and stop.
Evaluate — Test whether the context produces the desired agent behavior. Not unit tests with pass/fail gates, but error budgets, because LLM non-determinism breaks binary outcomes [5]. Adding instructions doesn’t just add behavior — it changes behavior. The ripple effects are the rule, not the exception.
Distribute — Get the right context to the right agents at the right time. This is the service catalog problem I wrote about recently [6]: you can’t reuse what you can’t find, and you can’t govern what you can’t discover.
Observe — Monitor how agents actually use the context in production. Context staleness is silent. You need scheduled independent eval runs to catch the drift [5].
The community immediately started extending this. Within two weeks of the talk, people had drawn 7-stage variants (adding Compile, Test, Deliver, Adapt as separate beats), 5-stage variants (pulling Adapt out of Observe), and versions that start with Observe rather than Generate [2]. Debois’s response: “When other people start adding their own stages, the idea has left the building. That’s how you know it’s working.”
Why “Better Context” Is Not “More Context”
There’s a trap in context engineering that catches most teams early: the assumption that more context means better results. The research says the opposite.
Chroma’s “Context Rot” study [7] tested 18 LLMs on retrieval tasks with increasing input lengths. The findings are counterintuitive:
- Performance degrades non-uniformly as input grows, even on trivial tasks
- A single topically-related-but-incorrect distractor reduces performance more than completely unrelated filler
- Shuffled haystacks (destroyed logical flow) actually improve retrieval compared to coherent structured text
- Focused inputs (~300 tokens of relevant context) significantly outperform full inputs (~113K tokens with irrelevant context mixed in)
The IBM team arrives at the same conclusion from the enterprise side: “Better context is not more context, it’s more precise context” [8].
This maps directly to what I’ve observed in practice. My agent system has ~50 steering files, skill definitions, and knowledge sources. The temptation is to load everything into every session. The reality is that selective loading — giving the agent exactly what it needs for the current task and nothing more — produces measurably better results. Context is not a buffet. It’s a prescription.
Sally-Ann Delucia from Arize puts it sharply: “Context decides what the model sees. Memory decides what survives” [9]. These are related but separate concerns. Conflating them is how you end up with agents that know everything and understand nothing.
Context as an Engineering Artifact
The most important shift in Debois’s framing is treating context as a first-class engineering artifact. Not a prompt. Not a configuration file. An artifact with properties:
- Source — Where did this context come from? A human author? An automated extraction? A customer conversation?
- Freshness — When was it last validated? Dennis Traub raises the sharpest version of this: “A best practice encoded a few months ago may be wrong today” [10]. If you’ve ever wondered why your agent quietly started doing the wrong thing six weeks after you wrote a “best practice” doc, that’s the diagnosis.
- Assumptions — What’s taken for granted? Every steering file encodes assumptions about the environment, the user, and the task. When those assumptions drift from reality, the context becomes actively harmful.
- Unresolved questions — What’s still unknown? Honest context admits its gaps rather than papering over them with confident-sounding instructions.
This is where the parallel to DevOps becomes precise. In 2009, infrastructure was treated as a manual, artisanal concern. DevOps said: no, infrastructure is code. Version it. Test it. Automate it. Review it. In 2026, context is treated as a manual, artisanal concern. Context engineering says the same thing: version it, test it, automate it, review it.
Vinay Krishna captures the inversion cleanly: “Code was the source of truth. Now context is the source of truth, code is just the output” [11].
A Worked Example: What CDLC Looks Like in Practice
I’ve been running a version of CDLC without calling it that for the past year. My agent system [3] has:
Generate: ~50 steering files that define behavior, constraints, and workflows. Skill files that package domain expertise into reusable units. A vault of 2,000+ notes that serve as the agent’s long-term memory.
Evaluate: Every skill has constraints that are testable. When I add a new steering rule, I run the agent through scenarios that exercise it. Not automated CI yet — but deliberate verification before distribution.
Distribute: Skills are organized in layers (core, core-beta, personal) with a promotion lifecycle. New skills start personal, get tested, then promote to shared. A merge system combines base configuration with local overrides so upstream updates don’t break personal customizations.
Observe: A retrospective analyzer runs after each session, comparing agent behavior against skill definitions and flagging drift. When the agent consistently works around a constraint, that’s a signal the constraint needs updating.
The pattern Debois describes — “every session is a new hire” — is exactly right. Each conversation starts fresh. The agent has no memory of yesterday unless you engineer that memory into the context. The steering files, skill definitions, and vault connections ARE the institutional knowledge. Without them, you’re onboarding a new employee every single time you open a chat window.
This is also why context engineering is harder than it looks. It’s not just writing good instructions. It’s maintaining a living system of instructions that evolve as your needs change, your tools update, and your understanding deepens. At team scale, the challenge multiplies: you need shared registries with approval workflows, context versioning with rollback, and ownership models that prevent the “everyone edits, nobody owns” failure mode. Jaroslaw Wasowski coined “context debt” for this [12] — the Cunningham-style sibling of technical debt. When your context artifacts drift from reality, the agent’s behavior degrades silently. No error messages. No test failures. Just gradually worse outputs that you might not notice until something breaks.
The Vocabulary Problem
Dennis Traub adds a dimension that most context engineering discussions miss: vocabulary [10]. AI coding agents are “the most confident version of Humpty Dumpty” — they interpret words however they see fit and never ask for clarification.
In traditional development, friction catches ambiguity. Code reviews. Pair programming. “What do you mean by ‘order’?” conversations. AI agents remove that friction. Code compiles, tests pass, but the semantic mismatch only surfaces downstream.
The solution is treating vocabulary as context. Matt Pocock built a glossary skill that scans a codebase, extracts terminology into markdown, and feeds it to the agent. The result: improved alignment AND reduced token usage [10]. Every MCP tool name is a vocabulary decision the LLM reasons over. confirm_purchase_intent() and submit_order() produce different agent behavior from the same underlying function.
This connects to a broader principle: context engineering isn’t just about what you tell the agent to do. It’s about the precision of the language you use to tell it. Ambiguity in human communication gets resolved through conversation. Ambiguity in agent context gets amplified into confidently wrong outputs.
What This Means for Practitioners
If you’re building with AI agents today, context engineering is already your job — whether you call it that or not. Five practices that define the discipline:
Audit your context surface. What does your agent actually receive? Most teams can’t answer precisely. Map every input: system prompts, tool descriptions, retrieved documents, conversation history, user preferences. You can’t improve what you can’t see.
Version and review context changes. If you wouldn’t push a code change without review, don’t push a context change without review either. A single steering file edit can alter agent behavior across dozens of workflows.
Measure precision, not volume. The instinct is to add more context when the agent fails. The research says the opposite: remove irrelevant context first. A focused 300-token input outperforms a 113K-token dump with the answer buried in noise [7].
Schedule freshness reviews. The steering file you wrote three months ago may encode assumptions that are no longer true. Context staleness is the silent killer of agent reliability.
Design for the loop, not the prompt. A single well-crafted prompt is a one-time win. A context system that generates, evaluates, distributes, and observes is a compounding advantage. Debois calls this the Context Flywheel [5]: “Better context produces better agent output. Better agent output generates better signals. Better signals produce better context.”
The Competitive Moat Nobody Talks About
Models are commoditizing. The gap between frontier models shrinks every quarter. Tools are converging — every IDE has an AI assistant, every cloud has an agent framework. What doesn’t commoditize is the context you’ve accumulated.
Two years of continuously refined steering files, validated against real production behavior, encoding the specific patterns and constraints of your domain. That’s the part nobody else has. It’s not transferable. It’s not downloadable. It’s the institutional knowledge of your AI-augmented workflow, and it compounds over time.
This is why I think context engineering will follow the same trajectory as DevOps. It starts as a practice. It becomes a discipline. It gets a lifecycle, tooling, and eventually dedicated roles. The teams that invest early in treating context as a first-class engineering concern will have a structural advantage that’s hard to replicate — not because the idea is secret, but because the accumulated context is theirs.
Debois drew the infinity loop. The community numbered the stages differently. That’s the point. The loop never ends. And apparently, neither does the debate about which step comes first.
If You’re Running This on AWS
The CDLC maps directly to managed services. Here’s how each stage translates:
| CDLC Stage | AWS Service | What It Does |
|---|---|---|
| Generate | Bedrock Knowledge Bases | Ingest, chunk, and embed your context artifacts (docs, wikis, APIs) into a searchable vector store. Supports multiple chunking strategies — hierarchical chunking preserves document structure, which matters for context precision [13] |
| Generate | AgentCore Memory | Short-term memory (within session) and long-term memory (cross-session, up to 365 days). Extracts key insights automatically so agents remember user preferences and prior decisions without manual summarization [14] |
| Evaluate | Bedrock Evaluations | Test whether your context produces the desired agent behavior. Supports both model-as-judge and human evaluation workflows. The error-budget approach Debois advocates maps to evaluation scores rather than binary pass/fail |
| Distribute | AgentCore Registry | The service catalog for agent skills. Register MCP servers, agents, and skills with rich metadata. Semantic discovery via natural language queries. Built-in approval workflows before resources become discoverable [6] |
| Observe | CloudWatch + CloudTrail | Trace every agent invocation, tool call, and context retrieval. CloudTrail audits all registry access. The observability layer that catches context drift before users report degraded outputs |
| Govern | Bedrock Guardrails | Runtime governance at the context boundary. Filter what goes in (prompt injection via retrieved documents, off-topic inputs) and what comes out (hallucinations, policy violations). This is also your adversarial defense layer — context injection attacks exploit the trust boundary between retrieved content and system instructions [15] |
The key architectural insight: these aren’t separate tools you bolt together. They’re layers of the same context pipeline. A Bedrock Agent retrieves from a Knowledge Base (Generate), uses Memory to maintain session continuity (Generate), gets discovered through the Registry (Distribute), runs through Guardrails at inference time (Govern), and logs everything to CloudWatch (Observe).
The piece most teams miss is the Evaluate stage. You can build the pipeline without it — agents will work. But without systematic evaluation of how context changes affect agent behavior, you’re flying blind. Context drift accumulates silently until something breaks. The teams that close this loop first will have the compounding advantage Debois describes.
Sources
[1] NewRelic, “The Incredible True Story of How DevOps Got Its Name” — https://newrelic.com/blog/nerd-life/devops-name
[2] Patrick Debois, “Two Weeks After ‘Context Is the New Code’ at AIE London” (May 2026) — https://jedi.be/blog/2026/two-weeks-after-context-is-the-new-code/
[3] Stefan Christoph, “The AI Content Pipeline: How I Publish 3x a Week Without a Content Team” — https://schristoph.online/blog/ai-content-pipeline/
[4] IBM Technology, “How to Pass Context in an Agentic AI Flow” — https://www.youtube.com/watch?v=UC4vDpSJCkM
[5] Patrick Debois / Tessl, “CI/CD for Context in Agentic Coding: Same Pipeline, Different Rules” — https://tessl.io/blog/cicd-for-context-in-agentic-coding-same-pipeline-different-rules/
[6] Stefan Christoph, “The Service Catalog Pattern for AI Agents” — https://schristoph.online/blog/service-catalog-pattern-ai-agents/
[7] Chroma Research, “Context Rot: How Increasing Input Tokens Impacts LLM Performance” — https://www.trychroma.com/research/context-rot
[8] IBM, “How RAG, GraphRAG, and Context Engineering Improve AI Performance” — https://www.youtube.com/watch?v=pN-LfxNFiTc
[9] AI Engineer / Sally-Ann Delucia (Arize), “Hierarchical Memory & Context for Agents” — https://www.youtube.com/watch?v=esY99nYXxR4
[10] Dennis Traub, “Your Agent Keeps Using That Word” — https://dev.to/aws/your-agent-keeps-using-that-word–4g36
[11] Vinay Krishna on LinkedIn (May 2026) — https://www.linkedin.com/feed/update/urn:li:activity:7454386374171967488/
[12] Jaroslaw Wasowski, “Managing Agent Context at Every Stage of the SDLC” — https://medium.com/@wasowski.jarek/managing-agent-context-at-every-stage-of-the-sdlc-cdlc-sdd-cecd0d575064
[13] AWS Documentation, “Knowledge Bases for Amazon Bedrock” — https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
[14] AWS Blog, “Amazon Bedrock AgentCore Memory: Building context-aware agents” — https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-agentcore-memory-building-context-aware-agents/
[15] AWS Documentation, “Amazon Bedrock Guardrails” — https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-use.html
About the Author
Stefan Christoph is a Principal Solutions Architect at AWS, focused on agentic AI, media & entertainment, and helping builders move from demo to production. He writes about AI architecture, developer productivity, and the future of software.
This is a personal blog. Opinions expressed here are my own and do not represent the views or positions of my employer.
❤️ Created with the support of AI (Kiro)