2026-06-06 Harness Engineering out of the box

# Harness Engineering, Out of the Box ## What happened when we introduced Matt Pocock's AI Skills across our engineering function Over the last few weeks we started using [Matt Pocock's AI skills](https://github.com/mattpocock/skills) across various teams. I went in expecting a modest engagement and a bit of a productivity bump. What I found was something more interesting: Matt's skill setup creates a fast path to serious *harness engineering*. It provides a foundational skillset for your coding agent, around any project. If you're not familiar with the term, [harness engineering](https://openai.com/index/harness-engineering/) is the practice of building the scaffolding around a coding agent — the workflows, context, conventions, and guardrails — so the agent produces work that fits how your team actually builds software. It's the difference between an agent that randomly autocompletes and an agent that collaborates the way you expect it to work. The hard part is setting up a robust, well-balanced harness. It takes effort, taste, and time. That's the gap Matt Pocock's AI skills close. ## The real surprise wasn't adoption speed I expected some skepticism and resistance. Engineers rightly question anything that proposes to change how they work. Instead, adoption was fast — but that's not the headline. The headline is *what* adoption got us. By virtue of using the same set of skills, we are building up a more uniform, sophisticated harness engineering practice for our projects, in a fraction of the time it would have taken us to assemble our own baseline harness. The skills don't just make individuals more effective. They uniformly elevate all teams to the **same** level of **how** we work with agents. Let me explain how and why that happens. ## What's the actual magic? It takes real effort and time to consistently apply good AI-assisted development principles. You have to know them, agree on them, and then enforce them. Pocock's engineering skills solve a lot of this out of the box. They bake in strong principles that don't have to be rediscovered on every project: - **Domain-driven design** as the default way of modeling the software project. - **User stories** as the unit of work, so the agent stays anchored to outcomes rather than abstractions. - Alignment on **architectural decisions** and **API design** before a single line of code is written. - A **universal language** defining the vocabulary of your product. That ensures you and the agent mean the same thing. - Approaching work as **tracer bullets** ensures the agent delivers fully functional, vertical slices instead of anemic horizontal layers. None of this is novel software engineering wisdom. What's novel is getting the whole package for "free", every time, with concise instructions, and without having to reiterate best practices with the coding agent on each new project. ### A shared vocabulary provides clarity Here's the thing that clicked the most for me: you and your agent start building and sharing a vocabulary (a `CONTEXT.md` file). That sounds soft, but it's the foundation everything else sits on. When you and the agent mean the exact same thing by "active user," "alerting system," or "usage tracker," the back-and-forth collapses. You stop translating; the agent stops guessing. Creating alignment stops being something you fight for, as it is baked into the process via conversations with your agent. I think this kind of alignment is where most of the productivity actually comes from, as it saves you many corrective steps later. Practically speaking, it means the agent creates and maintains a proper [context dictionary](CONTEXT-FORMAT) (as a `CONTEXT.md` file). ### The "Grill Me" skill changes the AI engineering paradigm If I had to point to one thing that shifts how AI engineering *feels*, it's the "[Grill me](Tech/Agents/skills/grill-me/SKILL.md)" skill. I recommend the Grill me [with docs](Tech/Software%20Development/Skills/grill-with-docs/SKILL.md) version for software engineering projects. My previous pattern was that I wrote a multi-page spec with one or more LLMs, and then handed it over to the agent. "Grill Me" makes that an interactive process. The agent interviews you. It pulls out requirements, defines vocabulary, and builds up architectural decision records through a conversation with you. The effect is twofold. First, having the agent ask critical questions surfaces requirements you might have overlooked in your spec. Second, because the agent records decisions as you go, you end up with architectural decision records and user stories as a natural by-product of this conversation. It flips the dynamic in a way that's both more aligned and more productive. You do less upfront writing and end up with better-defined work. The agent has more clarity about what you mean than ever before. You might be thinking this will only create more "AI slop" — documents that nobody needs. To my surprise, it turns out the agent takes these documents quite seriously. For example, as I progress with implementation and coding, if at some future point I give instructions that are contrary to an existing architectural decision record (ADR), the agent will point out that discrepancy and challenge you to review whether the ADR still holds. ### Vertical slices, not horizontal layers Another principle that the skills enforce: vertical slices, not horizontal layers, via "tracer bullets". Left to their own devices, software agents love to build capability layers. They create a data layer here, a service layer there, or just stomp out a unit test collection. That looks like progress, but delivers nothing a user can touch. The skills push agents to deliver slices of end-to-end working functionality instead: "Tracer bullets" — thin, complete paths from entry point to result that actually run. (If you are familiar with [Alistair Cockburn's elephant carpaccio](https://docs.google.com/document/d/1TCuuu-8Mm14oxsOnlk8DqfZAA1cvtYu9WGv67Yj_sSk/pub), tracer bullets are the same idea.) This saves you from two classic AI-assisted anti-patterns at once: The big-bang waterfall delivery that integrates everything at the very end (and breaks), and creating anemic horizontal slices that pile up inventory without producing working software. Vertical slices keep the project demonstrable and de-risked at every step. ### The strongest signal: generalizability Of everything, this is maybe the most surprising part: We adopted most of these skills with very minimal changes. That's the real signal. Tools that only work after heavy customization are usually telling you they don't fit your context. These skills are general enough to plug into an existing project, but opinionated enough to actually change agent behavior. That balance is hard to strike. In my personal version of Pocock's skills, I've only polished up the [TDD skill](Tech/Software%20Development/Skills/tdd/SKILL.md) and made sure that user story requirements are broken down as Gherkin *scenarios* by the [To Issues skill](Tech/Software%20Development/Skills/to-issues/SKILL.md) when that is a good fit. And of course, you can easily integrate these skills with your own harness. For example, I use [roborev](https://www.roborev.io/) for CI, and [Headroom](https://github.com/chopratejas/headroom) to make my coding sessions more token-efficient with the [agent wrapper](https://headroom-docs.vercel.app/docs/proxy#agent-wrapping). There is nothing in these skills that broke my personal coding agent harness workflows or setup. ## My take (conclusion) These skills aren't just productivity tools. They're the quickest on-ramp I've seen to building out your own harness engineering practice — a shared way of working between humans and agents that compounds over time. Every project that uses them creates its own context dictionary, evolves around explicit, shared architectural decision records, and follows the same delivery discipline. This light-weight "harness" can easily be extended to cover your organization's specific needs and integrates well with any existing tooling you have come to like. Pure, essential software engineering best practice, right out of the box. If you're still figuring out how to move your teams toward more principled harness engineering, [Matt Pocock's AI Hero](https://www.aihero.dev/) engineering practice is a great place to start.