2026-01-10 The rise of Agentic Context Engineering

# The rise of Agentic Context Engineering Let me share my vision how software engineering will shift in 2026. I believe 2026 is the year coding agents will become the de facto standard for coding. In this article, I will demonstrate why I feel that is so certain, and how to work with coding agents through [context engineering](https://www.llamaindex.ai/blog/context-engineering-what-it-is-and-techniques-to-consider). Tobi Lutke [defines](https://x.com/tobi/status/1935533422589399127) context engineering as "the art of providing all the context for the task to be plausibly solvable by the large language model,” which seems pretty spot on. Your context engineering skills will matter more than picking the best model. My goal is to outline what is possible today, by showing you how I provide context to my coding agent, and spark your curiosity to go deeper. I will demo how you can vibe code today without giving up control over your codebase. Even Andrej Karpathy has recently [expressed his surprise](https://x.com/karpathy/status/2004607146781278521) at the progress AI coding agents have made. This is part of 3-post series on context engineering: 1. This post. 2. [Context engineering combats context rot](2026-01-12%20Context%20engineering%20combats%20context%20rot.md) will explain why I believe context engineering will matter for a while longer. 3. [A primer on RAG (2026 edition)](2026-01-14%20A%20primer%20on%20RAG,%202026%20edition.md) will tell you how you can get started with building you own agents. --- ## Feeding context to your agent I now use Claude Code and GitHub Copilot to write software, including my own AI agents. Claude Code produces production-ready software without writing code yourself. Gemini CLI and similar tools achieve comparable results. Working with a coding assistant feels like guiding a smart junior engineer through chat, documents, and tools. LLMs can write production‑quality software with proper tests. But they don’t follow best practices on their own. And they don’t create good software architecture or API designs ad hoc. They need direction and guardrails. ### Project setup and instructions Feed your coding agent strong context: best practices, design docs, DevOps guidelines, and constraints. Anthropic's [Claude Code best practices](https://www.anthropic.com/engineering/claude-code-best-practices) and Matt Nigh's [agents.md advice](https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/) (from 2500 repos) show the path. I maintain evolving instructions for [Python](python.md) and [[TypeScript]] development, plus a global [CLAUDE.md](CLAUDE.md) with general agent guidelines in my `~/.claude/` and `~/.claude/rules/` directories. And you should define the API and architecture before the agent writes any code. Your documents should be easy for an LLM to absorb. [Mermaid](https://mermaid.ai/open-source/index.html) diagrams work well for sequences, C4 diagrams, and flowcharts. The agent can update Markdown diagrams as the project evolves and keep them up-to-date. #### Hands-on example: Project setup Let me show you what this exactly means and how it feels. In the context of this post, we will intend to create a simple [Retrieval-Augmented Generation](2026-01-14%20A%20primer%20on%20RAG,%202026%20edition.md) (RAG) service in Python. We will not go all the way to actually building a full RAG service. We will just get started enough so that you can see how to set yourself up. The recommended start is to figure out the "specs", how you want the service to operate, its architecture and APIs. That means setting up a new project folder, and then working with your agent to define the architecture: ```bash uv init ace-rag-example --python 3.12 cd ace-rag-example claude /init ``` This will trigger Claude to create a project-specific `CLAUDE.md` file that will follow my global settings. Comfortingly, Claude also aware of what `uv` put together so far and adds that to its own instructions. To let you follow along each step, I tracked the results of each step via git and [published the repo on GitHub](https://github.com/fnl/ace-rag-example). At this point we are at commit [2eaf7b95](https://github.com/fnl/ace-rag-example/commit/2eaf7b95cd920c68478959672dcf6c4db306f44f), the creation of the `CLAUDE.md` file. To get started with the architecture, I prompted Claude to create a sequence diagram of the planned RAG service, bootstrapping an `architecture.md` file, and linking it from `CLAUDE.md`. I instructed Claude four times to iteratively refine the architecture diagrams: first to create sequence diagrams for bulk upsert and query processes, then to add document modification checking and abstract the embedding service, then to show the embedding service behind the vector store and add a deletion process, and finally to reference the architecture from CLAUDE.md. This produced three reasonable diagrams ([Commit 0ce44475](https://github.com/fnl/ace-rag-example/commit/0ce44475faa58b648da98851f4dd1dbced08d43d)), such as this one: ``` ## Process 2: User Document Query A user queries the system to retrieve relevant documents and get a generated response. ``` ```mermaid sequenceDiagram participant User as User Client participant API as API Server participant VS as Vector Store participant Embed as Embedding Service participant LLM as LLM Service User->>API: POST /query (user question) API->>API: Validate request API->>VS: Search similar documents (query text, top-k) VS->>Embed: Generate embedding for query Embed-->>VS: Return query embedding vector VS->>VS: Perform similarity search VS-->>API: Return matching chunks with scores API->>API: Build prompt with retrieved context API->>LLM: Generate response (prompt + context) LLM-->>API: Return generated answer API-->>User: 200 OK (answer, source references) ``` And, I asked Claude to flesh out the REST API for this service: ``` Extend the architecture.md file to document the REST API for all three processes similarly to how Swagger would show it. ``` The above is not a perfect architecture and the API is pretty crappy. At this point, you normally would be going back and forth with Claude to properly design the architecture and especially a better API. It is worth investing this time upfront: The closer the design reflects your business needs, the faster Claude Code will flesh out the program you want. For brevity, however, I will leave the architecture and REST API definition exactly in the state to where I got it so far ([Commit 8be1b862](https://github.com/fnl/ace-rag-example/commit/8be1b862a148ed39e693163e501fd86058c750a1)). Defining the architecture (or the API docs) can be a context-intensive, long-running task, so I `/clear` the context before moving on. Once the architecture & API design is in place, and given my general Claude guidelines (such as my global rules folder), I prompt Claude to create a `README.md` for me. It should contain more detailed instructions how I want it to build this project, and links to the reference documentation for the frameworks and libraries I want Claude to use: ``` Populate the README.md file for the project. The project will provide a RAG web service where admins can upsert and delete documents, while users can make natural language queries against the indexed content. Users get answers in natural language back. Describe the dev env setup following the global Python rules, with a pytest testing and a Docker compose deployment process. The project should use *asynchronous* FastAPI: https://fastapi.tiangolo.com/reference/ for serving the RAG, FAISS: https://github.com/facebookresearch/faiss/wiki as its persisted vector store, BM25s: https://github.com/xhluca/bm25s?tab=readme-ov-file for transient full-text search, and LangFuse: https://langfuse.com/docs with its SDK: https://langfuse.com/docs/observability/sdk/overview for observaility and to run evals of the agent itself. ``` Claude asked for access to the referenced websites. It should store the links in the `README.md`, so it can later refer to them while developing. Claude needed several corrections: using `uv sync` instead of manual installs, adding pre-commit hooks per Python rules, removing premature Docker instructions, and updating the repo URL with framework documentation links. With that in place, we now have arrived at a pretty solid setup that should you let move fast with Claude Code. It will be giving Claude a good idea of where you want this project to go. ([Commit 0bcc6992](https://github.com/fnl/ace-rag-example/commit/0bcc69923de1ddbd091bec05938d1c4a3c38bb07)) Again, I `/clear` my context before moving to the next step. This setup phase involves iterative discussion with Claude on architecture, design, and stack. Global rules files handle most dev stack setup, though some prodding is needed. I continuously refine these rules based on Claude's mistakes. ### Multi-agent setups & Beads With clear guidelines and the relevant tools, agents can produce great software: Safe, tested, and production‑ready, with only light supervision. Once your setup works, you can scale it horizontally. Your bottleneck then becomes your attention and code review capacity, and that depends on the quality of the context you gave your agent. You can even have [several agents work together](https://simonwillison.net/2025/Oct/5/parallel-coding-agents/). To let multiple software agents work on the same codebase, you need synchronization. [Steve Yegge’s Beads](https://github.com/steveyegge/beads) provides exactly that capability. Beads works like a small JIRA built into Git. Each agent picks tasks, pulls changes, and pushes updates. Agents see what others are doing and what work is ready. The LLM can generate most of the tickets; you only supervise. There is even a [VS Code extension](https://marketplace.visualstudio.com/items?itemName=planet57.vscode-beads) to visualize and edit Beads. Even if you only have one agent running, Beads has provided me with a better experience than using Markdown files with task lists to direct your agent. #### Hands-on example: Beads setup If you want to work along through this part, I recommend you fork my sample project setup, so Beads can push and pull from origin. ``` gh repo fork fnl/ace-rag-example --clone ``` Now, let's have Claude Code figure out what tasks it should be creating. As with other setups, I have [my own Beads rules](beads.md) for Claude Code that I keep evolving. And note that I provide Claude with a hint at using Beads directly in my global [CLAUDE.md](CLAUDE.md) file. To initialize Beads in your project, run: ``` bd init # prefix ! to run in Claude Code ``` > Note: As I realized later in the tutorial, I forgot to put Beads sync on a different branch. The better command would have been: ``` bd init --branch beads-sync ``` I already have my instructions for Beads. And `bd prime` is pretty much the same material as the `AGENTS.md` file that `bd init` creates. Therefore, I deleted the `AGENTS.md` file. I start creating Issues by setting up their governing Epics. I do that by telling Claude what Epic I want it to create. While Claude then defines what tasks it needs to complete to achieve the goal of the Epic. I have my own [Gherkin rules](gherkin.md) file to describe how to do that and ask Claude to follow that structure: ``` Add a Beads Epic with the following User Story: "AS A user I WANT TO get precise, context-specific answers to questions in the RAG index SO THAT I can make informed decisions." ``` We are now at [Commit a598df4e](https://github.com/fnl/ace-rag-example/commit/a598df4e08cc8fcb40de4a26ad54af9da0c9d25f). Next comes an exciting part: Letting Claude write its own issues! I used the following prompt: ``` Break down the Epic into Issues. Define each issue as a Gherkin Scenario, and verify with me each Issue you want to add: Title, Scenario, Acceptance criteria if needed. ``` I prefer to be in full control of the issues, as that can help catch the agent doing stupid stuff early on. I course correct on issues such as this suggestion (my correction is at the bottom, starting with the `❯` angular bracket): ``` Issue 1: Submit a question and receive an answer with sources Title: Return answer with source references for a valid question Scenario (Gherkin): Scenario: User submits a question and receives an answer with sources Given documents have been indexed in the vector store When the user submits a question to POST /query Then the response status is 200 And the response contains an "answer" field with generated text And the response contains a "sources" array with matching chunks And each source includes document_id, chunk_index, score, content, and metadata Acceptance Criteria: - The API accepts a JSON body with a "question" field - The system retrieves relevant chunks from the vector store - The LLM generates an answer using the retrieved context - Sources are ordered by relevance score (highest first) --- Would you like me to add this issue, or do you want changes to the title, scenario, or acceptance criteria? ❯ "sources" should be the document ID[s] from where the answer came. It might not be that all matching chunks were sources. Don't include more than the answer, and the doc ID sources list in the Scenario. ``` Overall, Claude does pretty well, and most issues are OK as is, I only need to make one more minor correction (not shown). One Epic with four Issues is a bit of a toy example. You might add multiple Epics and work with Claude to flesh out the Issues. In my experience, working out the next 20 Issues is a good balance if you know where the project is going. Once you have your Issues fleshed out, ask Claude to identify dependencies between them, so they can be worked on in the correct order: ``` Add dependencies between the issues in Epic ace-rag-example-vo8 to identify in which order they need to be worked on. ``` Claude usually does a pretty great job at that, so I don't ask Claude to check in with me. Rather, you can tell it to fix case if you notice them later. ([Commit fc5ad4ab](https://github.com/fnl/ace-rag-example/commit/fc5ad4ab0d97eca398df3a7e2671b368fedc7b2d)) You anyway can get a quick overview of the dependency tree Claude created for a top level Epic with this command: ```bash > bd dep tree ace-rag-example-vo8 --direction both --status open 🌲 Full dependency graph for ace-rag-example-vo8: ace-rag-example-vo8: Get precise, context-specific answers from RAG index [P2] (open) [READY] ├── ace-rag-example-vo8.1: Return answer with source document IDs for a valid question [P2] (open) ├── ace-rag-example-vo8.2: Control number of retrieved chunks with top_k parameter [P2] (open) ├── ace-rag-example-vo8.3: Filter retrieved documents by metadata [P2] (open) └── ace-rag-example-vo8.4: Return appropriate response when no relevant documents found [P2] (open) ``` Of course, you can simply ask Claude to show you that tree... In a real project, setting up Beads creates a lot of context. You probably will want to `/clear` your context after each Epic. This concludes the project setup phase. You have fleshed out the architecture, written some documentation, and identified the work for your first sprint. ### Tool use & the Model Context Protocol [Tools give agents like Claude Code eyes and hands](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview). Today's coding agents run formatters, linters, security scanners, and UI explorers. Tools are functional interfaces the agent can choose to call. A prominent interface you have probably heard of is the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro). A SQL server can be exposed by an MCP ("proxy") server so an agent can explore the database it is working on using natural language. [Playwright’s MCP server](https://github.com/microsoft/playwright-mcp) lets the agent inspect the website you are developing or even **browse the web**. (Playwright is an automation framework for controlling browsers.) Of all tools, this is possibly the most useful one. If you have Playwright itself already installed, you can trivially add it to Claude Code with `claude mcp add playwright npx @playwright/mcp@latest`. Note that Claude Code is perfectly capable of installing Playwright itself for you. If you are happy with Claude using any browser, Claude has a [Chrome plugin ("Claude in Chrome (beta)")](https://code.claude.com/docs/en/chrome). That lets Claude explore the Web without the need of a special MCP server like Playwright. The downside is that this setup is pretty risky: Your Claude agent now can browse the Web. So if your agent were attacked, it could be doing unwanted things on your behalf on the Web... Many MCP servers are available as Docker containers on the [MCP Catalog](https://hub.docker.com/mcp). With Docker Desktop, you therefore can very quickly [set up your agents with almost any tool](https://blog.agentailor.com/posts/docker-mcp-catalog-and-toolkit). If you want better work from your agent, let that agent see the outcomes of its actions! #### Hands-on example: All tools on board already A useful rule: [never write instructions for things a tool can handle](https://www.humanlayer.dev/blog/writing-a-good-claude-md). Do not describe style rules; instruct your coding agent when to run a formatter, instead. I let my agents run a formatter, static analyzer, and code complexity checker after completing a coding task. And a dependency auditor after adding a dependency. They fix issues reported by those tools themselves, no instructions needed. Outcome: Lots of context saved, because you don’t need to instruct your agent! This was already addressed during the project setup, by [creating git pre-commit hooks](https://github.com/fnl/ace-rag-example/commit/0bcc69923de1ddbd091bec05938d1c4a3c38bb07). I asked Claude to create a `.pre-commit-config.yaml` file and install it via `uvx pre-commit`, just as described in my [Python rules](python.md). For most coding projects, the tools that Claude has built-in should be sufficient, except maybe Playwright. And if you have the GitHub `gh` command-line client, Claude can also interact with your remote and any GH issues already. If you have set up the Playwright MCP server or the Claude in Chrome extension, you can ask your agent to contact any local server, so you can learn how your agent uses it. ### Test-Driven Development [I ask my agents to use test‑driven development](testing.md), [Kent Beck style](https://tidyfirst.substack.com/p/canon-tdd). Tests stabilize the code-base as it grows. They also make the agent’s reasoning easier to review. I still read the tests, especially early on, which slows things down at first. Over time, you learn where the agent is reliable and where you need to step in. Sometimes I ask for tests before letting the agent make any code changes. Good instructions, for example user stories expressed through [Gherkin language](https://medium.com/@nic/writing-user-stories-with-gherkin-dda63461b1d2), reduce surprises. Overall, iterating with TDD cycles helps you learn your agent’s habits faster. #### Hands-on example: TDD with Claude Now that the project is set up, we can finally let Claude do some vibe coding. I don't practice vibe coding as in "I never look at the code", but rather as "I never write a line of code." As I have a pretty complete setup, I don't need to tell Claude much more than to simply work on the first issue: ``` Go and work on a ready issue. ``` Now you will see the benefits of the setup. Claude uses all the context we built up, and it even reads the architecture documentation before starting its work. This is the scenario of the first issues: ``` GIVEN documents have been indexed in the vector store WHEN the user submits a question to POST /query THEN the response contains an "answer" field with generated text AND the response contains a "sources" array with document IDs used by the answer ``` Because I have set up Claude to strictly follow TDD, it will do that without the need for any further prompting. It is up to you to decide which commands you want Claude to ask permission for, and which ones you let it run without oversight. For example, I like to be in control of dependencies, so I don't let it `uv add` dependencies without checking in with me, first. The other thing I let Claude check in with me is closing Beads issues. That is the time when I double-check its work: What was the commit, what was the ticket, does the work make sense? As the project matures, you will have more and more issues that Claude finishes without you needing to interfere. That is the moment when I let Claude close tickets without my direct review. By then, I have a good intuition what kind of tickets Claude gets done without needing my input. If not, I reopen tickets or I ask Claude to file follow-on tickets if more appropriate. In the beginning, extensive hand-holding is needed. For the [first commit](https://github.com/fnl/ace-rag-example/commit/b786380586b74b7fe7fbf3b07d1b25e32923d5f9), I had to redirect Claude three times: fixing poor type annotations in the test, reducing excessive assertions to match the scenario, and trimming down api.py to the minimum needed to pass the test. Because those are pretty universal issues, I also tweaked my rules and project `CLAUDE.md` a bit, in the hopes that Claude will do better at the next attempt. [Commit 995a79c8](https://github.com/fnl/ace-rag-example/commit/995a79c8ba14c463ca55a3aafd4f11ba27e7e9cf) implements the first scenario. This highlights the biggest nuisance: Claude generates unwanted "fluff". As projects mature, this improves, but initially you need small steps and hand-holding. Early mistakes have larger downstream impact. Another concern is that Claude will not perfectly follow the plan. You might notice that the sequence diagram indicated that the API Server would merge the chunks into the context of the prompt. In the code, Claude envisions that to be handled by the LLM Service. Not a big deal, and arguably the better choice, but it nicely show-cases to that you cannot expect instruction following down to the last detail. Of course, you could criticize much more of the code written so far. I prefer to not do that, trusting my design and architecture are good while the test harness is robust and effective. Because with a good test harness, you later can let Claude run quite big refactoring efforts. Let's take one more step and let Claude work on another Bead: ``` /clear work on a ready issue ``` Claude claimed completion but the test only verified the top_k parameter was passed, not that at most 3 chunks were used. After three corrections to test the actual scenario and remove unnecessary code, the Bead was done. ([Commit f51a9929](https://github.com/fnl/ace-rag-example/commit/f51a99296739e0dcd42a3b0a74afbf10f39f9f16)) At this point, I also realized that Claude is still committing the `bd sync` commits on the main branch. That gets pretty noisy, especially if you have multiple agents. I had forgotten to send those off to another branch, by initializing with `bd init --branch beads-sync`. Instead, I fixed it now, with: ```bash bd config set sync.branch beads-sync bd daemon list # to get the pid bd daemon restart <pid> ``` Restarting the daemon is important, otherwise the config change has no effect. To demonstrate this change to `bd sync` worked, I let Claude work out another Bead: ``` /clear work on a ready issue ``` This time, the test and implementation look a lot cleaner, so I let it pass without further comments. ([Commit 8ca8a2b1](https://github.com/fnl/ace-rag-example/commit/8ca8a2b1c7a6a1aef03be78fcea7754c8488f421)) Now, the sync commits are isolated to the `beads-sync` branch. The `main` branch no longer gets "littered". I let Claude review and improve the `README.md` to reflect where we got to in this session: ``` Review the README.md and update it to match the current project status. ``` Claude realized that `main.py` should not be there, but takes no steps to remove it. So I prod it on that, too: ``` Can you remove main.py? You just dropped it from the README.md. ``` And I let Claude make the commit and push. Final status of this example project: https://github.com/fnl/ace-rag-example Hopefully, this example walk-through has given you a good impression how I work with Claude Code. --- ## Conclusion Once the project is mature enough that code changes by Claude only require light supervision, you can have multiple agents work in parallel. You can do that by creating git worktrees and starting a `claude` session in each one across multiple terminal windows. With a bunch of agents working on your code, you will create software far faster than human‑only teams. Manual coding will fade away, becoming rare outliers in the development process. I haven't written code in a while anymore, except for occasional one-line fixes. (And even then, I sometimes regret it - when it turns out the fix needs more than that change...) The new limits are your usage plan and your ability to orchestrate agents. Costs are also low. In my experience, an LLM can build a small backend or simple frontend for 40 Euros or less. No human team can match that speed or price. AI grows more powerful with better models, larger contexts, and tool use. Today’s coding agents are already strong enough that you should delegate software coding to them. If you have not worked with a coding agent like Claude, now is the time. Most importantly, [as Anthropic keeps reminding us](https://open.substack.com/pub/post/p/the-ai-revolution-is-here-will-the), remember that **what we have today is the worst it will ever be**. A coding agent might produce bad code without guidance, but with proper instructions the results are astonishing. Who knows how much less guidance the next iteration of Claude Code will need… If you are keen to read on, in [part two](2026-01-12%20Context%20engineering%20combats%20context%20rot.md) I will explain how context engineering combats context rot and why I believe context engineering will matter for a while longer.