Human Review Is Intent Review, Not Diff Review
We still assign a human reviewer to every pull request. The human opens the diff, scrolls, approves. That ritual is already dead. Most teams are just propping up the corpse.
The question underneath it, the one nobody wants to say out loud: what is a human code review even for once agents write the code?
Four Levels of What a Review Catches
A review can catch four kinds of problems, and three of them no longer need a person.
- Linting and formatting. Tooling owns this. It always should have.
- Line-level logic bugs. The agent catches these locally, before the code ever reaches you. It runs the tests, reads the error, fixes the line.
- Macro bugs across files. The AI reviewer handles these reasonably well now. A pass that reads the whole diff and flags the cross-file break is cheap and getting cheaper.
- Should this PR even exist. Does it fit the business context. Will it have some macro consequence nobody asked for.
That last level is the whole remaining human surface. Everything below it is automated or about to be. When you review a diff line by line, you are doing the machine's job and skipping your own.
The Team Already Lives This
Ask engineers what they actually do when a review lands in their queue and they will tell you the truth if you let them. One told me there is no value in reading the diff: he reads the AI review on the PR and nothing else. Another admitted the team assigns reviewers and nobody actually reviews. These are not complaints. They are accurate descriptions of where the work went.
The old loop (assign a human, human reads the diff, human approves) collapsed the moment agents started writing most of the lines. Pretending otherwise produces a zero-signal approval that everyone treats as a gate.
The Regression That Bites
Here is the failure mode that actually hurts. An agent implements feature B. Along the way it quietly changes a line that belonged to feature A. To make CI green, it rewrites feature A's test to match the new, wrong behavior. The suite passes. The diff looks clean. Feature A is now broken in production and nothing flagged it.
This is not a freak event. An agent will write a confident, terrible test and convince itself of anything across a few hundred thousand tokens of context. Which means tests cannot be the source of truth for what the system is supposed to do. They are mutable, and the agent will mutate them to make red turn green. I have made this argument about tests as ceremony before: a test that only mirrors the implementation proves nothing.
Intent Is the Thing That Isn't Mutable
If the tests are mutable and the diff is just the output, what is fixed? Intent. And intent lives in the ticket, the commit, the PR description. That is why the review surface has to move off the diff and onto the intent.
Tickets are prompts now. When I review your ticket, I am reviewing your agent's plan before it runs, which is the cheap place to catch "this should not be built this way." Catching it after the code exists is the expensive place.
The diff is lossy compression of the intent. You cannot reliably back out what someone meant from the lines they changed, which is exactly why reviewing the diff feels like guesswork. So stop trying to decompress it. Review the compression source.
Two Things to Change This Week
Move review earlier. I want architectural feedback before the code exists, not a rubber stamp after it ships. The expensive mistakes are decided at the planning step, and that is where attention is cheapest to spend.
Say what you want reviewed. "Please review my PR" produces the zero-signal approval everyone already gives. "Check the shape of this schema, I am worried about the migration path" produces an actual review. Name the thing you are unsure about, because that is the only part where a second set of eyes earns its cost.
You Cannot Hold the Platform in Your Head Anymore
There is a second problem hiding behind the first. No one holds the whole platform in their head now, and we do not even hold the parts the agents wrote. The answer is T-shaped: deep in your one area, shallow everywhere else, and a shared floor of context on the domains that hurt the most when they break.
Take billing. The fix for billing breaking is not more billing code review. It is a written, immutable record of what billing is supposed to do (the use cases, the edge cases, the user journeys) that is durable enough to feed the agent so it knows what not to break. The record is the intent, made legible to the machine. You protect a domain by writing down what it means, not by watching diffs scroll past.
A Mechanical Version Worth Trying
There is a version of intent review you can partly automate. For the lines a change touches, blame back to the commit that introduced them, follow that commit's reference to its ticket, and check whether the new change still respects the old intent. That is feature A defending itself against feature B, automatically. Whether it is tractable or just too heavy is an open question, but the shape is right: the system carries its own intent forward instead of relying on a human to remember it.
Review Effort Should Match Blast Radius
None of this means review disappears. It means review stops being uniform. Pre-AI, pull request volume was self-limiting, so "review everything" was affordable. Now volume explodes, and a flat "review everything" policy guarantees the queue congests and people start bypassing it. Seventeen-second self-merges are not laziness. They are an un-tiered review policy collapsing under load.
Progressive review, three tiers:
- Low-risk: auto-merge on checks passing. No human. The checks are the gate.
- Medium-risk: human review available, not required. It merges without a person if no one picks it up.
- High-risk: human review required. Hard gate, no self-merge.
This is a trade, not a tax. The team experiences "review required" as friction and routes around it. Progressive review gives them auto-merge on the roughly sixty percent of changes that are low-risk, in exchange for a credible hard gate on the ten percent that actually ship bugs to production. You buy enforcement on the dangerous changes by removing it from the safe ones. The same shape shows up wherever I look at blast radius: ceremony should be calibrated to consequence, not applied flat.
One thing to validate before you turn on tier-one auto-merge: if your impact classifier mislabels a high-blast-radius change as low, auto-merge automates the bypass instead of closing it. Check the false-negative rate first.
Stop Reviewing Diffs
Open your last ten merged pull requests. For each one, ask which the human reviewer actually needed to see, and what they were supposed to catch that the tooling did not. If the honest answer for most of them is "nothing," your review process is ceremony.
The diff is the output. Intent is the input, and the input is the only thing worth a human's attention. Stop reviewing diffs. Review intent.

