The Harness Eats the Coding
The most valuable thing I do as an engineer right now isn't writing code. It isn't even reviewing code. It's building the harness that lets the agent verify its own work before it asks me to look at it.
Where the Time Actually Goes
If I track where my hours actually land on any non-trivial piece of work, they fall into three roughly equal buckets:
- One third on the front end. Preparation, architecting, ticketing, building the spec the agent will work from.
- One third on the back end. Verification, reviewing what the agent produced, catching the parts it got wrong.
- One third on the process itself. The harness. What skills, hooks, tests, reviewers, PR checks, and instrumentation does this codebase need so the agent can verify its own output before it surfaces anything to me?
That last third is the one most teams don't budget for. It feels like overhead. It isn't. It's the part that determines whether the other two thirds compound or stay flat.
The cost of skipping it shows up in the data. Faros AI's 2026 analysis of 10,000+ developers across 1,255 teams found that as teams moved from low to high AI adoption, code churn (lines deleted relative to lines added) jumped 861%, and the incidents-to-PR ratio rose 242.7%. That's the harness gap, rendered in production failures. Teams sped up the writing without speeding up the verification, and the difference came due in incidents.
The First Three Iterations Won't Land
The first three iterations of any agent run aren't going to land cleanly. That's just the rate. So the question isn't "did the agent get it right the first time?" The question is "did the agent have the tools to know it got it wrong, and try again, before I had to look?"
This isn't a guess about how agents fail. METR's randomized study of experienced developers on real open-source issues found developers were actually 19% slower when AI was allowed, even though those same developers believed they were 20% faster. The perception gap is the tell, and it's the same reason your AI metrics are lying to you. Without a harness that forces verification, you're shipping the early iterations and calling it done.
That's what the harness is for. GitClear's analysis of 211 million lines of code reinforces the same point from the other side: refactored ("moved") lines collapsed from 24.1% in 2020 to 9.5% in 2024, and code clones grew eight times in a single year. When the harness isn't there to push back, agents default to copy-paste and skip the cleanup. The harness is what forces that work back into the loop. Concretely, on the projects I work on, that means:
- The right tests in place. Not just unit tests, but the integration and end-to-end coverage that catches the kinds of regressions an agent is prone to introducing. Tests stop being ceremony the moment an agent depends on them to know it's done.
- Multi-agent review hooks. A security pass, a code-cleanliness pass, an architecture pass, all running before the PR even gets opened.
- PR checks that exercise the actual success criteria from the ticket, not just CI green or red.
- Instrumentation good enough that "it works" can be verified without me booting up the project and clicking around.
Test Isolation Becomes a Budget Line Item
This is where hexagonal architecture is key. I want the agent to run the project locally with as many third-party dependencies mocked out as possible. No database boot. No dev environment. No manual setup. Just spin up, exercise the workflow, verify the result. This was always valuable for me. This is now critical for agents.
When the agent can do that, I get another loop back. "Can you execute this workflow through the UI and get the expected data?" becomes something the agent answers itself, not something I have to do manually as the fourth iteration's reviewer.
I've also wired up the remote dev instance so that when the agent needs to test something end-to-end in a real browser, it can. Claude connects to a Chrome instance that opens on my local screen via x11. The agent drives the browser. I watch the result. That's a verification loop I'm no longer in the middle of.
Where the Time Savings Actually Come From
If the agent has the tools it needs to verify its own work, I can take another step back. The first three passes happen without me. By the fourth iteration, when it tells me it's done, there's a much better chance it actually is.
That's where the time savings come from. Not from typing faster. Not from getting code generated faster. From being the reviewer of the fourth attempt instead of the first.
The work shifted. The leverage moved from writing code to building the conditions under which an agent can know whether the code it wrote is correct.

