The Harness Eats the Coding

Jun 23

The most valuable thing I do as an engineer right now isn't writing code. It isn't even reviewing code. It's building the harness that lets the agent verify its own work before it asks me to look at it.

Where the Time Actually Goes

If I track where my hours actually land on any non-trivial piece of work, they fall into three roughly equal buckets:

One third on the front end. Preparation, architecting, ticketing, building the spec the agent will work from.
One third on the back end. Verification, reviewing what the agent produced, catching the parts it got wrong.
One third on the process itself. The harness. What skills, hooks, tests, reviewers, PR checks, and instrumentation does this codebase need so the agent can verify its own output before it surfaces anything to me?

That last third is the one most teams don't budget for. It feels like overhead. It isn't. It's the part that determines whether the other two thirds compound or stay flat.

The cost of skipping it shows up in the data. Faros AI's 2026 analysis of 10,000+ developers across 1,255 teams found that as teams moved from low to high AI adoption, code churn (lines deleted relative to lines added) jumped 861%, and the incidents-to-PR ratio rose 242.7%. That's the harness gap, rendered in production failures. Teams sped up the writing without speeding up the verification, and the difference came due in incidents.

The First Three Iterations Won't Land

The first three iterations of any agent run aren't going to land cleanly. That's just the rate. So the question isn't "did the agent get it right the first time?" The question is "did the agent have the tools to know it got it wrong, and try again, before I had to look?"

This isn't a guess about how agents fail. METR's randomized study of experienced developers on real open-source issues found developers were actually 19% slower when AI was allowed, even though those same developers believed they were 20% faster. The perception gap is the tell, and it's the same reason your AI metrics are lying to you. Without a harness that forces verification, you're shipping the early iterations and calling it done.

That's what the harness is for. GitClear's analysis of 211 million lines of code reinforces the same point from the other side: refactored ("moved") lines collapsed from 24.1% in 2020 to 9.5% in 2024, and code clones grew eight times in a single year. When the harness isn't there to push back, agents default to copy-paste and skip the cleanup. The harness is what forces that work back into the loop. Concretely, on the projects I work on, that means:

The right tests in place. Not just unit tests, but the integration and end-to-end coverage that catches the kinds of regressions an agent is prone to introducing. Tests stop being ceremony the moment an agent depends on them to know it's done.
Multi-agent review hooks. A security pass, a code-cleanliness pass, an architecture pass, all running before the PR even gets opened.
PR checks that exercise the actual success criteria from the ticket, not just CI green or red.
Instrumentation good enough that "it works" can be verified without me booting up the project and clicking around.

Test Isolation Becomes a Budget Line Item

This is where hexagonal architecture is key. I want the agent to run the project locally with as many third-party dependencies mocked out as possible. No database boot. No dev environment. No manual setup. Just spin up, exercise the workflow, verify the result. This was always valuable for me. This is now critical for agents.

When the agent can do that, I get another loop back. "Can you execute this workflow through the UI and get the expected data?" becomes something the agent answers itself, not something I have to do manually as the fourth iteration's reviewer.

I've also wired up the remote dev instance so that when the agent needs to test something end-to-end in a real browser, it can. Claude connects to a Chrome instance that opens on my local screen via x11. The agent drives the browser. I watch the result. That's a verification loop I'm no longer in the middle of.

Where the Time Savings Actually Come From

If the agent has the tools it needs to verify its own work, I can take another step back. The first three passes happen without me. By the fourth iteration, when it tells me it's done, there's a much better chance it actually is.

That's where the time savings come from. Not from typing faster. Not from getting code generated faster. From being the reviewer of the fourth attempt instead of the first.

The work shifted. The leverage moved from writing code to building the conditions under which an agent can know whether the code it wrote is correct.

Related Content

Featured

June 23, 2026

The Harness Eats the Coding

June 23, 2026

June 16, 2026

The Iteration Loop Got Longer. That Changed Everything.

June 16, 2026

The thing nobody talks about with AI-assisted development isn't the models. It's the cycle time. The agent's iteration loop got longer, the right way to work changed, and most people are still working as if the loop is two seconds long.

June 16, 2026

June 9, 2026

Your Laptop Is Just a Portal

June 9, 2026

My laptop is a four-year-old Dell XPS 15 with 16 gigs of RAM. Fine for normal work. Not fine for running Windows, WSL, a real codebase, a Claude session, and a browser at the same time. It came to a head over Thanksgiving last year, when I was accidentally on the road for three weeks and couldn't get serious work done. WSL on 16 gigs just exploded.

The first fix was offloading development to an EC2 instance. That worked, but the monthly bill kept climbing and the hardware was still anemic for what I actually needed. So I bought a remote dev box for the home lab and moved everything off the EC2.

That's the boring origin story. The interesting part is what the setup unlocked.

June 9, 2026

June 2, 2026

Tickets Are the New Prompts

June 2, 2026

I haven't written a Linear ticket by hand in six months. I don’t write the majority of my Claude prompts. The two stopped being separate things. The ticket is the prompt.

June 2, 2026

May 26, 2026

The Amdahl's Law Problem in AI-Assisted Development

May 26, 2026

AI did not make the whole software delivery system faster.

It made one stage louder.

That is the part missing from most productivity conversations right now. A developer gets a coding assistant, the coding step accelerates, and everyone acts like the entire SDLC should accelerate by the same amount. Then review queues grow. Test failures pile up. Deployment gets riskier. Senior engineers spend more of their day reconstructing intent from code that looks plausible but does not quite match the system.

That is not a paradox. That is Amdahl's Law doing exactly what Amdahl's Law does.

Speed up one stage in a constrained system, and the bottleneck moves.

May 26, 2026

May 19, 2026

Concentric Feedback Loops: How AI Agent Teams Actually Ship Code

May 19, 2026

I've been rebuilding one of my Claude Code workflows because the old version was too linear.

That sounds like a small implementation detail. It isn't. It points at the part of AI-assisted development that most teams are about to run into: once agents can do real work for hours, strict phase gates start getting in the way of the feedback loops that make the work safe.

The normal development cycle is familiar: requirements, plan, plan review, implementation, tests, peer review, more implementation, more tests, security review, architecture review, integration testing, end-to-end testing. We pretend this is a clean sequence because it is easier to write down that way.

It has never been that clean.

The work has always been loops. AI agent teams just make the loops visible.

May 19, 2026

May 12, 2026

Your Team's AI Metrics Are Lying to You

May 12, 2026

Your engineering team adopted AI coding tools six months ago. Deployment frequency is up. Lead time is down. PRs are flying through the pipeline. Everyone feels faster.

But are they?

I've been digging into the data across multiple client engagements, and there's a growing gap between what AI-assisted engineering teams perceive and what the numbers actually show. The metrics most teams celebrate are painting an incomplete picture, and the metrics that would tell the real story are the ones nobody's watching.

May 12, 2026

May 5, 2026

Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

May 5, 2026

You're four steps into an AI agent workflow. Steps one through three went perfectly. Step four goes sideways. So you start correcting. "No, do it this way." "Try again." "No, like this." Five corrections later, the output is worse than when the problem first appeared.

The instinct to fix things in place is deeply human. It's also exactly wrong when working with AI agents.

May 5, 2026

April 28, 2026

Why Every AI Workflow Converges on the Same Architecture

April 28, 2026

Three AI agents. Three different problem contexts. Each time, the solution emerged with the same architecture.

The first was my own operational agent. A personal partner for research, drafting, and scheduling. The second was a marketing content bot I helped a client team build. The third was an analytics workflow for another team. Different domains, different users, different stakeholders. But when I stepped back and compared the three designs, the structural similarity was impossible to ignore.

I didn't plan it. I wasn't working from a blueprint. I was solving three different problems and each time, I ended up reaching for the same three layers: an immutable identity, compiled learnings, and a human approval gate.

One builder reaching for the same shape across three contexts isn't proof of a universal law. But the fact that I keep reaching for it without trying to is worth sitting with. Every production AI workflow I've built that survives contact with reality seems to pull in this direction. Not because anyone prescribed it. Because the problems keep forcing it.

April 28, 2026

April 21, 2026

The SDLC is Rediscovering Itself

April 21, 2026

AI is forcing software development back to first principles. The practices most teams abandoned as overhead, specs, formal verification, architectural review gates, are becoming essential again the moment humans stop reading every line of code.

I've watched this play out across my own work this year. The discipline I used to skip because it slowed me down is suddenly the only thing standing between a working system and a pile of plausible-looking garbage. The SDLC didn't die. It got hollowed out, and now it's being rebuilt in place, one abandoned practice at a time.

April 21, 2026

aisdlcengineering

Brian Conn https://connsulting.io

The Harness Eats the Coding

Where the Time Actually Goes

The First Three Iterations Won't Land

Test Isolation Becomes a Budget Line Item

Where the Time Savings Actually Come From

Related Content

Connsulting

About

Offerings