Human Review Is Intent Review, Not Diff Review

Jun 30

We still assign a human reviewer to every pull request. The human opens the diff, scrolls, approves. That ritual is already dead. Most teams are just propping up the corpse.

The question underneath it, the one nobody wants to say out loud: what is a human code review even for once agents write the code?

Four Levels of What a Review Catches

A review can catch four kinds of problems, and three of them no longer need a person.

Linting and formatting. Tooling owns this. It always should have.
Line-level logic bugs. The agent catches these locally, before the code ever reaches you. It runs the tests, reads the error, fixes the line.
Macro bugs across files. The AI reviewer handles these reasonably well now. A pass that reads the whole diff and flags the cross-file break is cheap and getting cheaper.
Should this PR even exist. Does it fit the business context. Will it have some macro consequence nobody asked for.

That last level is the whole remaining human surface. Everything below it is automated or about to be. When you review a diff line by line, you are doing the machine's job and skipping your own.

The Team Already Lives This

Ask engineers what they actually do when a review lands in their queue and they will tell you the truth if you let them. One told me there is no value in reading the diff: he reads the AI review on the PR and nothing else. Another admitted the team assigns reviewers and nobody actually reviews. These are not complaints. They are accurate descriptions of where the work went.

The old loop (assign a human, human reads the diff, human approves) collapsed the moment agents started writing most of the lines. Pretending otherwise produces a zero-signal approval that everyone treats as a gate.

The Regression That Bites

Here is the failure mode that actually hurts. An agent implements feature B. Along the way it quietly changes a line that belonged to feature A. To make CI green, it rewrites feature A's test to match the new, wrong behavior. The suite passes. The diff looks clean. Feature A is now broken in production and nothing flagged it.

This is not a freak event. An agent will write a confident, terrible test and convince itself of anything across a few hundred thousand tokens of context. Which means tests cannot be the source of truth for what the system is supposed to do. They are mutable, and the agent will mutate them to make red turn green. I have made this argument about tests as ceremony before: a test that only mirrors the implementation proves nothing.

Intent Is the Thing That Isn't Mutable

If the tests are mutable and the diff is just the output, what is fixed? Intent. And intent lives in the ticket, the commit, the PR description. That is why the review surface has to move off the diff and onto the intent.

Tickets are prompts now. When I review your ticket, I am reviewing your agent's plan before it runs, which is the cheap place to catch "this should not be built this way." Catching it after the code exists is the expensive place.

The diff is lossy compression of the intent. You cannot reliably back out what someone meant from the lines they changed, which is exactly why reviewing the diff feels like guesswork. So stop trying to decompress it. Review the compression source.

Two Things to Change This Week

Move review earlier. I want architectural feedback before the code exists, not a rubber stamp after it ships. The expensive mistakes are decided at the planning step, and that is where attention is cheapest to spend.

Say what you want reviewed. "Please review my PR" produces the zero-signal approval everyone already gives. "Check the shape of this schema, I am worried about the migration path" produces an actual review. Name the thing you are unsure about, because that is the only part where a second set of eyes earns its cost.

You Cannot Hold the Platform in Your Head Anymore

There is a second problem hiding behind the first. No one holds the whole platform in their head now, and we do not even hold the parts the agents wrote. The answer is T-shaped: deep in your one area, shallow everywhere else, and a shared floor of context on the domains that hurt the most when they break.

Take billing. The fix for billing breaking is not more billing code review. It is a written, immutable record of what billing is supposed to do (the use cases, the edge cases, the user journeys) that is durable enough to feed the agent so it knows what not to break. The record is the intent, made legible to the machine. You protect a domain by writing down what it means, not by watching diffs scroll past.

A Mechanical Version Worth Trying

There is a version of intent review you can partly automate. For the lines a change touches, blame back to the commit that introduced them, follow that commit's reference to its ticket, and check whether the new change still respects the old intent. That is feature A defending itself against feature B, automatically. Whether it is tractable or just too heavy is an open question, but the shape is right: the system carries its own intent forward instead of relying on a human to remember it.

Review Effort Should Match Blast Radius

None of this means review disappears. It means review stops being uniform. Pre-AI, pull request volume was self-limiting, so "review everything" was affordable. Now volume explodes, and a flat "review everything" policy guarantees the queue congests and people start bypassing it. Seventeen-second self-merges are not laziness. They are an un-tiered review policy collapsing under load.

Progressive review, three tiers:

Low-risk: auto-merge on checks passing. No human. The checks are the gate.
Medium-risk: human review available, not required. It merges without a person if no one picks it up.
High-risk: human review required. Hard gate, no self-merge.

This is a trade, not a tax. The team experiences "review required" as friction and routes around it. Progressive review gives them auto-merge on the roughly sixty percent of changes that are low-risk, in exchange for a credible hard gate on the ten percent that actually ship bugs to production. You buy enforcement on the dangerous changes by removing it from the safe ones. The same shape shows up wherever I look at blast radius: ceremony should be calibrated to consequence, not applied flat.

One thing to validate before you turn on tier-one auto-merge: if your impact classifier mislabels a high-blast-radius change as low, auto-merge automates the bypass instead of closing it. Check the false-negative rate first.

Stop Reviewing Diffs

Open your last ten merged pull requests. For each one, ask which the human reviewer actually needed to see, and what they were supposed to catch that the tooling did not. If the honest answer for most of them is "nothing," your review process is ceremony.

The diff is the output. Intent is the input, and the input is the only thing worth a human's attention. Stop reviewing diffs. Review intent.

Related Content

Featured

June 30, 2026

Human Review Is Intent Review, Not Diff Review

June 30, 2026

We still assign a human reviewer to every pull request. The human opens the diff, scrolls, approves. That ritual is already dead. Most teams are just propping up the corpse.

The question underneath it, the one nobody wants to say out loud: what is a human code review even for once agents write the code?

June 30, 2026

June 23, 2026

The Harness Eats the Coding

June 23, 2026

The most valuable thing I do as an engineer right now isn't writing code. It isn't even reviewing code. It's building the harness that lets the agent verify its own work before it asks me to look at it.

June 23, 2026

June 16, 2026

The Iteration Loop Got Longer. That Changed Everything.

June 16, 2026

The thing nobody talks about with AI-assisted development isn't the models. It's the cycle time. The agent's iteration loop got longer, the right way to work changed, and most people are still working as if the loop is two seconds long.

June 16, 2026

June 9, 2026

Your Laptop Is Just a Portal

June 9, 2026

My laptop is a four-year-old Dell XPS 15 with 16 gigs of RAM. Fine for normal work. Not fine for running Windows, WSL, a real codebase, a Claude session, and a browser at the same time. It came to a head over Thanksgiving last year, when I was accidentally on the road for three weeks and couldn't get serious work done. WSL on 16 gigs just exploded.

The first fix was offloading development to an EC2 instance. That worked, but the monthly bill kept climbing and the hardware was still anemic for what I actually needed. So I bought a remote dev box for the home lab and moved everything off the EC2.

That's the boring origin story. The interesting part is what the setup unlocked.

June 9, 2026

June 2, 2026

Tickets Are the New Prompts

June 2, 2026

I haven't written a Linear ticket by hand in six months. I don’t write the majority of my Claude prompts. The two stopped being separate things. The ticket is the prompt.

June 2, 2026

May 26, 2026

The Amdahl's Law Problem in AI-Assisted Development

May 26, 2026

AI did not make the whole software delivery system faster.

It made one stage louder.

That is the part missing from most productivity conversations right now. A developer gets a coding assistant, the coding step accelerates, and everyone acts like the entire SDLC should accelerate by the same amount. Then review queues grow. Test failures pile up. Deployment gets riskier. Senior engineers spend more of their day reconstructing intent from code that looks plausible but does not quite match the system.

That is not a paradox. That is Amdahl's Law doing exactly what Amdahl's Law does.

Speed up one stage in a constrained system, and the bottleneck moves.

May 26, 2026

May 19, 2026

Concentric Feedback Loops: How AI Agent Teams Actually Ship Code

May 19, 2026

I've been rebuilding one of my Claude Code workflows because the old version was too linear.

That sounds like a small implementation detail. It isn't. It points at the part of AI-assisted development that most teams are about to run into: once agents can do real work for hours, strict phase gates start getting in the way of the feedback loops that make the work safe.

The normal development cycle is familiar: requirements, plan, plan review, implementation, tests, peer review, more implementation, more tests, security review, architecture review, integration testing, end-to-end testing. We pretend this is a clean sequence because it is easier to write down that way.

It has never been that clean.

The work has always been loops. AI agent teams just make the loops visible.

May 19, 2026

May 12, 2026

Your Team's AI Metrics Are Lying to You

May 12, 2026

Your engineering team adopted AI coding tools six months ago. Deployment frequency is up. Lead time is down. PRs are flying through the pipeline. Everyone feels faster.

But are they?

I've been digging into the data across multiple client engagements, and there's a growing gap between what AI-assisted engineering teams perceive and what the numbers actually show. The metrics most teams celebrate are painting an incomplete picture, and the metrics that would tell the real story are the ones nobody's watching.

May 12, 2026

May 5, 2026

Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

May 5, 2026

You're four steps into an AI agent workflow. Steps one through three went perfectly. Step four goes sideways. So you start correcting. "No, do it this way." "Try again." "No, like this." Five corrections later, the output is worse than when the problem first appeared.

The instinct to fix things in place is deeply human. It's also exactly wrong when working with AI agents.

May 5, 2026

April 28, 2026

Why Every AI Workflow Converges on the Same Architecture

April 28, 2026

Three AI agents. Three different problem contexts. Each time, the solution emerged with the same architecture.

The first was my own operational agent. A personal partner for research, drafting, and scheduling. The second was a marketing content bot I helped a client team build. The third was an analytics workflow for another team. Different domains, different users, different stakeholders. But when I stepped back and compared the three designs, the structural similarity was impossible to ignore.

I didn't plan it. I wasn't working from a blueprint. I was solving three different problems and each time, I ended up reaching for the same three layers: an immutable identity, compiled learnings, and a human approval gate.

One builder reaching for the same shape across three contexts isn't proof of a universal law. But the fact that I keep reaching for it without trying to is worth sitting with. Every production AI workflow I've built that survives contact with reality seems to pull in this direction. Not because anyone prescribed it. Because the problems keep forcing it.

April 28, 2026

aiengineeringintent

Brian Conn https://connsulting.io

Human Review Is Intent Review, Not Diff Review

Four Levels of What a Review Catches

The Team Already Lives This

The Regression That Bites

Intent Is the Thing That Isn't Mutable

Two Things to Change This Week

You Cannot Hold the Platform in Your Head Anymore

A Mechanical Version Worth Trying

Review Effort Should Match Blast Radius

Stop Reviewing Diffs

Related Content

Connsulting

About

Offerings