This post describes how software engineering will shift in 2026. [Like many others](https://www.modelanalysis.ai/p/software-development-in-the-age-of), I believe 2026 is the year coding agents will become the de facto standard for coding. I will discuss why I feel that is so certain, and how to work with coding agents through [context engineering](https://www.llamaindex.ai/blog/context-engineering-what-it-is-and-techniques-to-consider). [Tobi Lutke](https://x.com/tobi/status/1935533422589399127) defines context engineering as "the art of providing all the context for the task to be plausibly solvable by the large language model,” which seems pretty spot on.
Your [context engineering skills will matter](https://lumberjack.so/my-predictions-for-2026-in-ai/) more than picking the best model/agent. My goal is to outline what is possible today, show how to work with coding agents, and spark your curiosity to go deeper. Even Andrej Karpathy has recently [expressed his surprise](https://x.com/karpathy/status/2004607146781278521) at the progress AI coding agents have made. I also highlight where research is going: toward better attention mechanisms, smarter routing, and memory systems that let agents act more autonomously.
If you only want my coding agent tips, you can jump directly to the section describing how to [feed context to your agent](#Feeding%20context%20to%20your%20agent).
## Context engineering combats context rot

A key realization is that it matters much more [how you instruct agents and what context you provide](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) than what model you use. A weak model fed with clear context produces far better outcomes than a strong model with noisy information. Gathering and evaluating the context is the real challenge. Humans do this subconsciously. We intuit what matters, what we can ignore, and how to structure information. Emulating that process to feed your large language model (LLM) is the soul of context engineering. I once used a weaker LLaMA‑3.1 model for hours while thinking it was GPT‑5.1. That experience made me aware how much context engineering trumps model choice.
### Why context engineering will matter for a while
Here is another datapoint to support the "context engineering is king" claim: The top models all perform similarly when reasoning over large context windows. (The **context window** is the entire span of tokens sent to a large language model; the **prompt** is the latest input or chat message you send to the model.) [LongBench v2](https://longbench2.github.io/) shows this clearly. The o1‑preview model from mid‑2024 performs about as well as Qwen3‑235B, released a year later. Even Gemini 2.5 Pro does not really stand out that much. Despite big differences in model size and age, their long‑context reasoning ability is fairly similar.
In case you are asking yourself: What is long-context reasoning? **Long-context reasoning** refers to the ability of LLMs to process and reason over extended input sequences—often spanning thousands or even millions of tokens—while maintaining accuracy and coherence. This capability enables handling complex tasks like piecing together scattered relational facts in lengthy documents or multi-step planning without losing key details.
In other words, models are only slowly getting better at piecing together information that is scattered throughout their growing context windows. Hence, context engineering is our bridge to help agents achieve better results - at least, until LLMs have much better long-range reasoning skills. And as pointed out with the LongBench results, that doesn't feel as if it is around the corner.
### Scaling long-range context reasoning
Why does long‑context reasoning not scale linearly with context window size? Classical [multi‑head attention](https://magazine.sebastianraschka.com/p/understanding-and-coding-self-attention) compares all tokens pairwise (see image below). This means the number of comparisons grows quadratically with the size of the context window. That imposes computational limits on models using full multi‑head attention. Instead, models use heuristics or simpler approaches to get around that limitation. Sparse attention reduces which tokens each query can see. [Grouped or multi‑query attention](https://machinelearningmastery.com/a-gentle-introduction-to-multi-head-attention-and-grouped-query-attention/) forces heads to share parameters and limits specialization. [Linear attention](https://manifestai.com/blogposts/faster-after-all/) approximations trade accuracy for speed. These tricks let models handle large context windows but weaken fine‑grained reasoning.

The basic principle of Attention: How strongly words connect to each other within a layer of the Transformer network (Source: [Visualizing attention in Transformers](https://www.comet.com/site/blog/explainable-ai-for-transformers/) with BertViz).
### Context rot and new architectures
All this has been coined [context rot](https://research.trychroma.com/context-rot) by the Chroma team. They showed that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. In particular, [their results on LongMemEval](https://research.trychroma.com/context-rot#longmemeval) match exactly what I'd expect given how we are scaling attention. Their conclusion: "**In practice, long context applications are often far more complex, requiring synthesis or multi-step reasoning. Based on our findings, we would expect performance degradation to be even more severe under those conditions.**"
Even if full attention could scale infinitely, matching every token pair might drown the signal with noise—unless you could also scale data infinitely. This is why models are no longer doubling in size every so often. Better architectures now matter more than brute‑force scaling. [Qwen’s attention gating](https://towardsdatascience.com/neurips-2025-best-paper-review-qwens-systematic-exploration-of-attention-gating/) and [Deepseeks’s manifold-constrained hyper‑connections](https://deepseek.ai/blog/deepseek-mhc-manifold-constrained-hyper-connections) point in this direction. (Residual connections forward the input signal into later network layers, providing a strong gradient for those deep layers. [Hyper‑connections](https://arxiv.org/abs/2409.19606) do the same while learning how much of the signal to carry forward.) Given the central role of the attention mechanism in the Transformer, I am not very bullish that we will solve context rot in 2026. Maybe, one day, we will be happy to no longer need attention...
And [the bitter lesson](https://en.wikipedia.org/wiki/Bitter_lesson) reminds us: compute‑driven, self‑supervised methods eventually win. Long‑context reasoning will emerge from even more scalable architectures than today's quadratic attention. That mechanism will connect related data more effectively than pairwise attention, [maybe even with humans in the loop](https://pmc.ncbi.nlm.nih.gov/articles/PMC12546433/). For most of us, this simply means: You’ll get value faster from better context engineering than from waiting for newer models or architectures.
## Feeding context to your agent
Let’s take a closer look at software agents. By now, I mostly use [Claude Code](https://www.claude.com/product/claude-code) and [GitHub Copilot](https://github.com/features/copilot/cli) to write software, including my own AI agents. Claude Code can produce complex, production‑ready software without you writing any code yourself. If you use [Gemini CLI](https://geminicli.com/) or similar tools, you can achieve the same results. Other agents might lag a bit behind Claude, but in my experience, not that much. Working with a coding assistant feels like guiding a very smart junior engineer purely through chat, documents, and tools.
LLMs already write [production‑quality software](https://www.indragie.com/blog/i-shipped-a-macos-app-built-entirely-by-claude-code). But they don’t follow best practices on their own. And they don’t create good software architecture or API designs ad hoc. They need direction and guardrails.
### Instructions
[Anthropic](https://www.anthropic.com/engineering/claude-code-best-practices) has shown the path forward. [You feed the coding agent strong context](https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/?utm_source=perplexity): Best practices, design docs, DevOps guidelines, and constraints. You provide those documents, often working with an LLM to generate them. For example, I keep evolving [my personal instructions for Python](python.md) and [[TypeScript]] development. And you should define the API and architecture before the agent writes any code. Your documents should be easy for an LLM to absorb. [Mermaid](https://mermaid.ai/open-source/index.html) diagrams work well for sequences, C4 diagrams, and flowcharts. The agent can update Markdown diagrams as the project evolves and keep them up-to-date.
### Test-Driven Development
Second, [I ask my agents to use test‑driven development](testing.md), [Kent Beck style](https://tidyfirst.substack.com/p/canon-tdd). Tests stabilize the codebase as it grows. They also make the agent’s reasoning easier to review. I still read the tests, especially early on, which slows things down at first. Over time, you learn where the agent is reliable and where you need to step in. Sometimes I ask for tests before letting the agent make any code changes. Good instructions, for example user stories expressed through [Gherkin language](https://medium.com/@nic/writing-user-stories-with-gherkin-dda63461b1d2), reduce surprises. Overall, iterating with TDD cycles helps you learn your agent’s habits faster.
### Tool use & the Model Context Protocol
Tool use unlocks a whole new level. [Tools give agents like Claude Code eyes and hands](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview). Today’s coding agents run formatters, linters, security scanners, and UI explorers. Tools are functional interfaces the agent can choose to call. A prominent interface you have probably heard of is the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro). A SQL server can be exposed by an MCP ("proxy") server so an agent can explore the database it is working on using natural language. [Playwright’s MCP server](https://github.com/microsoft/playwright-mcp) lets the agent inspect the website you are developing or even browse the web. (Playwright is an automation framework for controlling browsers.)
Many MCP servers are available as Docker containers on the [MCP Catalog](https://hub.docker.com/mcp). With Docker Desktop, you therefore can very quickly [set up your agents with almost any tool](https://blog.agentailor.com/posts/docker-mcp-catalog-and-toolkit). If you want better work from your agent, let that agent see the outcomes of its actions!
A useful rule: [never write instructions for things a tool can handle](https://www.humanlayer.dev/blog/writing-a-good-claude-md). Do not describe style rules; instruct your coding agent when to run a formatter, instead. I let my agents run a formatter, static analyzer, and code complexity checker after completing a coding task. And a dependency auditor after adding a dependency. They fix issues reported by those tools themselves, no instructions needed. Outcome: Lots of context saved, because you don’t need to instruct your agent!
### Multi-agent setups & Beads
With clear guidelines and the relevant tools, agents can produce great software: Safe, tested, and production‑ready, with only light supervision. Once your setup works, you can scale it horizontally. Your bottleneck then becomes your attention and code review capacity, and that depends on the quality of the context you gave your agent. You can even have [several agents work together](https://simonwillison.net/2025/Oct/5/parallel-coding-agents/).
To let multiple software agents work on the same codebase, you need synchronization. [Steve Yegge’s Beads](https://github.com/steveyegge/beads) provides exactly that capability. Beads works like a small JIRA built into Git. Each agent picks tasks, pulls changes, and pushes updates. Agents see what others are doing and what work is ready. The LLM can generate most of the tickets; you only supervise. There is even a [VS Code extension](https://marketplace.visualstudio.com/items?itemName=planet57.vscode-beads) to visualize and edit Beads.
Give it a try—it is a fantastic tool to guide your agents. Even if you only have one agent running, Beads has provided me with a better experience than using Markdown files with task lists to direct your agent.
## Conclusion
With a bunch of agents, you will create software far faster than human‑only teams. Manual coding will fade to outliers. I haven't written code in a while, except for occasional one-line fixes. (And even then, I sometimes regret it - when it turns out the fix needs more than that change...) The new limits are your usage plan and your ability to orchestrate agents. Costs are also low. In my experience, an LLM can build a small backend or simple frontend for 40 euros or less. No human team can match that speed or price.
AI grows more powerful with better models, larger contexts, and tool use. Today’s coding agents are already strong enough that you should delegate software coding to them. If you have not worked with a coding agent like Claude, now is the time. Most importantly, [as Anthropic keeps reminding us](https://open.substack.com/pub/post/p/the-ai-revolution-is-here-will-the), remember that **what we have today is the worst it will ever be**. A coding agent might produce bad code without guidance, but with proper instructions the results are astonishing. Who knows how much less guidance the next iteration of Claude Code will need…