Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

May 5

You're four steps into an AI agent workflow. Steps one through three went perfectly. Step four goes sideways. So you start correcting. "No, do it this way." "Try again." "No, like this." Five corrections later, the output is worse than when the problem first appeared.

The instinct to fix things in place is deeply human. It's also exactly wrong when working with AI agents.

The Context Window Remembers Everything

Everything that happens in a conversation stays in the context window. Every response, every correction, every failed attempt. When an agent fails at step four and you say "no, do it this way," the context now contains: good work on steps one through three, a failed attempt at step four, a correction, another failed attempt, another correction.

That's the foundation the agent builds on for every subsequent response. It's not starting from a clean understanding of what you need. It's trying to reconcile good early work with a growing pile of corrections and bad output.

You're not going to get back to good from there. Chroma's research on what they call "context rot" tested 18 frontier LLMs and found that every single one degrades in performance as context length increases, with accuracy dropping 30% or more when relevant information sits in the middle of a long context. Even a single piece of conflicting information reduces accuracy.

Failure Compounds in Context

I tell teams this bluntly: once an agent is off track in a chat, it's going to stay off track in that chat. The bad performance is baked into the context window.

This might surprise people. They assume AI agents learn from feedback the way a colleague would. Tell a person "you're doing that wrong, do it this way instead," and they adjust. They have persistent memory. They genuinely course-correct. They may save memories for later chats that correct this behavior, but this session is permanently tainted.

AI agents don't work that way. They process the entire conversation as a single block of text every time they generate a response. A conversation full of corrections reads like a conversation full of confusion. The signal-to-noise ratio degrades with every failed attempt.

The agent isn't ignoring your feedback. It's weighting your feedback against the entire history of the conversation, including all the wrong outputs that preceded it. A Microsoft and Salesforce study simulating over 200,000 conversations across 15 leading LLMs found that multi-turn performance dropped an average of 39% compared to single-turn, with even the most capable models showing 30-40% degradation. Their core finding: when LLMs take a wrong turn in a conversation, they get lost and do not recover.

The Right Pattern: Diagnose, Then Reset

When an agent goes off track, here's what actually works:

Stop the task immediately. Don't let it keep trying. Don't feed it more corrections while it's still attempting the work.

Ask it to diagnose the problem. This is where the current conversation is still valuable. Say "stop, tell me why you're struggling with this." The agent can often articulate what it's confused about, what constraints are conflicting, or what part of your instructions is ambiguous. That diagnostic output is genuinely useful.

Take the feedback and start a new session. This is the critical step. Don't say "great, now try again." Take what you learned from the diagnostic conversation and open a completely fresh chat. New session. Clean context window.

In that new session, incorporate the lesson. If the agent was confused about step four because your instructions were vague, write clearer instructions. If it was tripping over a conflicting constraint, remove the conflict. Now the agent attempts the full sequence with corrected instructions and zero history of failure.

The results are dramatically better.

Start Loose, Then Tighten

A related principle helps prevent these situations in the first place: don't be overly prescriptive from the start.

When setting up a new agent workflow, describe the task the way you'd describe it to a new employee. Give it your checklist. Give it links to relevant resources. Let it surprise you. The models have come far enough that they'll often figure out a reasonable approach on their own. Understanding where your team sits on the AI adoption ladder helps calibrate how much prescription is appropriate at each stage.

You'll still see the agent struggle in unexpected places. It might handle the complex parts easily and trip over something you thought was simple. Those surprises are valuable. They tell you where your instructions need refinement.

But the refinement happens in the next session, not this one.

The Feedback Loop That Actually Works

The effective cycle looks like this:

Start a session with your current best instructions
Let the agent work until it succeeds or fails
If it fails, diagnose why within that session
Take the diagnosis to a new session with updated instructions
Repeat

This is fundamentally different from the cycle most people fall into: start a session, hit a problem, correct in place, hit another problem, correct again, get frustrated, abandon the whole approach. There are exceptions to this for when one correction will hit on the right answer, then the chat is over, but for any long, complex workflow, it’s better to start fresh.

The first pattern produces steadily improving results across sessions. The second produces degrading results within each session and no improvement across sessions, because the lessons never get incorporated into a clean starting point. This iterative refinement across sessions maps to the same principle behind risk evaluation in AI-aided development: calibrate your process to what the tools actually do well, not what you wish they did.

Make the Reset Easy

One practical tip: make it easy to start fresh. If starting a new session requires re-entering a bunch of context, you'll resist doing it. Set up your projects, templates, or system prompts so that spinning up a new conversation with full context takes seconds, not minutes.

The easier the reset, the more willing you'll be to do it at the first sign of trouble instead of burning twenty minutes trying to salvage a session that's already contaminated. I've written about building this kind of reusable context into your AI workflow before: the goal is making each session self-contained and disposable.

Clean Slate, Better Work

The models are capable and getting better fast. But they have a fundamental architectural property that isn't changing anytime soon: the context window is their entire working reality. Keep it clean, and they'll do good work. Let it fill up with failure and corrections, and no amount of additional correction gets you back on track.

Every significant correction is a signal to start over. Diagnose what went wrong, update your instructions, and begin a clean session. The overhead of starting fresh is trivially small compared to the time wasted wrestling with a contaminated context.

Start fresh. It's always the right call.

Related Content

Featured

May 5, 2026

Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

May 5, 2026

The instinct to fix things in place is deeply human. It's also exactly wrong when working with AI agents.

May 5, 2026

Apr 28, 2026

Why Every AI Workflow Converges on the Same Architecture

Apr 28, 2026

Three AI agents. Three different problem contexts. Each time, the solution emerged with the same architecture.

The first was my own operational agent. A personal partner for research, drafting, and scheduling. The second was a marketing content bot I helped a client team build. The third was an analytics workflow for another team. Different domains, different users, different stakeholders. But when I stepped back and compared the three designs, the structural similarity was impossible to ignore.

I didn't plan it. I wasn't working from a blueprint. I was solving three different problems and each time, I ended up reaching for the same three layers: an immutable identity, compiled learnings, and a human approval gate.

One builder reaching for the same shape across three contexts isn't proof of a universal law. But the fact that I keep reaching for it without trying to is worth sitting with. Every production AI workflow I've built that survives contact with reality seems to pull in this direction. Not because anyone prescribed it. Because the problems keep forcing it.

Apr 28, 2026

Apr 21, 2026

The SDLC is Rediscovering Itself

Apr 21, 2026

AI is forcing software development back to first principles. The practices most teams abandoned as overhead, specs, formal verification, architectural review gates, are becoming essential again the moment humans stop reading every line of code.

I've watched this play out across my own work this year. The discipline I used to skip because it slowed me down is suddenly the only thing standing between a working system and a pile of plausible-looking garbage. The SDLC didn't die. It got hollowed out, and now it's being rebuilt in place, one abandoned practice at a time.

Apr 21, 2026

Apr 15, 2026

The Intent Gap

Apr 15, 2026

Your AI-generated code is degrading, and the degradation isn't a tooling problem. It's a translation problem, and every step of the chain is lossy.

I've fought against this degradation. I swap models. I rerun implementation prompts. The damage happens before the first line of code gets generated, in a chain of translations that no refactoring pass can reverse. This is a different problem than the one I wrote about in Risk Evaluation in the Age of AI-Aided Development, which is about deciding when AI acceleration is worth the technical debt. The Intent Gap is upstream of that decision.

I call the thing at the center of this the Intent Gap: the distance between what you meant and what the AI produced. The Gap is where everything fails. And once you see it, you can't unsee it.

Apr 15, 2026

Apr 7, 2026

Tests as Ceremony: When AI Breaks the Safety Net

Apr 7, 2026

AI-generated tests pass. That's the problem.

Passing is not a useful correctness criterion. Mark Seemann makes this argument sharply: AI-generated tests have "little epistemological content." They skip the critical step of seeing a test fail before writing code. The test exists, the coverage number goes up, and everyone moves on. But the test never proved anything. It never caught a bug, because it was never designed to catch one.

Apr 7, 2026

Apr 2, 2026

SDLC is Dead, Long Live the SDLC

Apr 2, 2026

The software development lifecycle isn't dead. It just lost its center of gravity.

For decades, the bottleneck in software development was writing code. Requirements flowed downhill through design, architecture, and planning, all funneling toward the expensive part: turning ideas into working software. The entire SDLC was organized around this constraint. We optimized hiring, tooling, and process around the assumption that code production was the hard part.

AI changed that equation. Code writing is now commoditized. AI can produce syntactically correct, functionally reasonable code at a pace no human team can match. The bottleneck didn't disappear. It moved.

Apr 2, 2026

Mar 24, 2026

The Three Questions That Tell You What to Automate

Mar 24, 2026

Not every repetitive task is worth automating. Some tasks feel tedious but resist automation because they require subtle judgment at every step. Others feel complex but are actually just long sequences of mechanical steps. Knowing the difference saves you from building automation that never delivers, or manually grinding through work a script could handle. I've found that three questions reliably separate the automatable from the not-yet-automatable. They work whether you're evaluating a candidate for AI assistance, a custom script, or a full workflow tool. This framework also applies to choosing what to build versus what to shed, as I explored in Tactical Work Shedding.

Mar 24, 2026

Mar 17, 2026

From Two Minutes to Ten Seconds - The ROI of Personalized Software

Mar 17, 2026

I've written before about personalized software as the hidden iceberg of the AI era. Software that was never economical to build, but that people genuinely want. I keep coming back to this idea because I keep building more of it. My latest example is so small it barely qualifies as a project, and that's exactly what makes it worth talking about.

Mar 17, 2026

Mar 11, 2026

Claude Code as an Operational Partner for DevOps

Mar 11, 2026

People are building incredible things with AI coding tools. But there's a quieter, equally powerful use case: using Claude Code as an operational partner. DevOps work is half investigation, and AI coding tools are remarkably effective at analysis, script generation, and iterative diagnostics alongside a human who handles execution and judgment.

Mar 11, 2026

Mar 3, 2026

The AI Adoption Ladder - A Practical Framework for Engineering Teams

Mar 3, 2026

Most AI adoption failures share the same origin story: someone tries the hardest possible task, it fails spectacularly, and they declare they'll "come back next year." This happens constantly because teams lack a mental model for sequencing adoption.

After helping engineers navigate AI integration, I've developed a staged approach I call the AI Use-Case Ladder. It sequences adoption by risk and blast radius, building confidence and literacy before touching anything that could damage production systems.

Mar 3, 2026

aiengineering

Brian Conn https://connsulting.io

Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

The Context Window Remembers Everything

Failure Compounds in Context

The Right Pattern: Diagnose, Then Reset

Start Loose, Then Tighten

The Feedback Loop That Actually Works

Make the Reset Easy

Clean Slate, Better Work

Related Content

Connsulting

About

Offerings