2026-01-10 Context engineering combats context rot

# Context engineering combats context rot The goal of this post is to explain why context engineering will be around for a while. We will discuss long-context reasoning, why it so challenging, what context rot is, and that eventually, context engineering will be part of the [Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html). But probably not this year (2026), at least... This post is part of a 3-part series: 1. [The rise of Agentic Context Engineering](2026-01-10%20The%20rise%20of%20Agentic%20Context%20Engineering.md) 2. This post. 3. [A primer on RAG (2026 edition)](2026-01-10%20A%20primer%20on%20RAG,%202026%20edition.md) --- A key realization of AI engineering is that it matters much more [how you instruct agents and what context you provide](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) than what model you use. A weak model fed with clear context produces far better outcomes than a strong model with noisy information. Gathering and evaluating the context is the real challenge. Humans do this subconsciously. We intuit what matters, what we can ignore, and how to structure information. Emulating that process to feed your large language model (LLM) is the soul of context engineering. I once used a weaker LLaMA‑3.1 model for hours while thinking it was GPT‑5.1. Someone had reconfigured the connection and I was linking my flow to LLaMA-3.1. I only later realized that change had happened, because even the evals in that particular case were pretty similar. That experience made me aware how much context engineering trumps model choice. In my work, I often compare different models on the same flows, and it hardly happens that a newer model out-performs the existing setup out of the box. ### Why context engineering will matter for a while Here is another datapoint to support the idea that "context engineering is king": The top models all perform similarly when reasoning over large context windows. (The **context window** is the entire span of tokens sent to a large language model; the **prompt** is the latest input or chat message you send to the model.) [LongBench v2](https://longbench2.github.io/) shows this clearly. The o1‑preview model from mid‑2024 performs about as well as Qwen3‑235B, released a year later. Even Gemini 2.5 Pro does not really stand out that much. Despite big differences in model size and age, their long‑context reasoning ability is fairly similar. In case you are asking yourself: What is long-context reasoning? **Long-context reasoning** refers to the ability of LLMs to process and reason over extended input sequences—often spanning thousands or even millions of tokens—while maintaining accuracy and coherence. This capability enables handling complex tasks like piecing together scattered relational facts in lengthy documents or multi-step planning without losing key details. In other words, models are only slowly getting better at piecing together information that is scattered throughout their growing context windows. Hence, context engineering is our bridge to help agents achieve better results - at least, until LLMs have much better long-range reasoning skills. And as pointed out with the LongBench results, that doesn't feel as if it is around the corner. ### Scaling long-range context reasoning Why does long‑context reasoning not scale linearly with context window size? Classical [multi‑head attention](https://magazine.sebastianraschka.com/p/understanding-and-coding-self-attention) compares all tokens pairwise (see image below). This means the number of comparisons grows quadratically with the size of the context window. That imposes computational limits on models using full multi‑head attention. Instead, models use heuristics or simpler approaches to get around that limitation. Sparse attention reduces which tokens each query can see. [Grouped or multi‑query attention](https://machinelearningmastery.com/a-gentle-introduction-to-multi-head-attention-and-grouped-query-attention/) forces heads to share parameters and limits specialization. [Linear attention](https://manifestai.com/blogposts/faster-after-all/) approximations trade accuracy for speed. These tricks let models handle large context windows but weaken fine‑grained reasoning. ![Attention visualized with BertViz](attention.gif) The basic principle of Attention: How strongly words connect to each other within a layer of the Transformer network (Source: [Visualizing attention in Transformers](https://www.comet.com/site/blog/explainable-ai-for-transformers/) with BertViz). ### Context rot and new architectures All this has been coined [context rot](https://research.trychroma.com/context-rot) by the Chroma team. They showed that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. In particular, [their results on LongMemEval](https://research.trychroma.com/context-rot#longmemeval) match exactly what I'd expect given how we are scaling attention. Their conclusion: "**In practice, long context applications are often far more complex, requiring synthesis or multi-step reasoning. Based on our findings, we would expect performance degradation to be even more severe under those conditions.**" Even if full attention could scale infinitely, matching every token pair might drown the signal with noise—unless you could also scale data infinitely. This is why models are no longer doubling in size every so often. Better architectures now matter more than brute‑force scaling. [Qwen’s attention gating](https://towardsdatascience.com/neurips-2025-best-paper-review-qwens-systematic-exploration-of-attention-gating/) and [Deepseeks’s manifold-constrained hyper‑connections](https://deepseek.ai/blog/deepseek-mhc-manifold-constrained-hyper-connections) point in this direction. (Residual connections forward the input signal into later network layers, providing a strong gradient for those deep layers. [Hyper‑connections](https://arxiv.org/abs/2409.19606) do the same while learning how much of the signal to carry forward.) Given the central role of the attention mechanism in the Transformer, I am not very bullish that we will solve context rot in 2026. Maybe, one day, we will be happy to no longer need attention... While I am confident that context rot will be solved *eventually*, somehow, you’ll get value faster from better context engineering than from waiting for newer models or architectures.