Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

You're four steps into an AI agent workflow. Steps one through three went perfectly. Step four goes sideways. So you start correcting. "No, do it this way." "Try again." "No, like this." Five corrections later, the output is worse than when the problem first appeared.

The instinct to fix things in place is deeply human. It's also exactly wrong when working with AI agents.

The Context Window Remembers Everything

Everything that happens in a conversation stays in the context window. Every response, every correction, every failed attempt. When an agent fails at step four and you say "no, do it this way," the context now contains: good work on steps one through three, a failed attempt at step four, a correction, another failed attempt, another correction.

That's the foundation the agent builds on for every subsequent response. It's not starting from a clean understanding of what you need. It's trying to reconcile good early work with a growing pile of corrections and bad output.

You're not going to get back to good from there. Chroma's research on what they call "context rot" tested 18 frontier LLMs and found that every single one degrades in performance as context length increases, with accuracy dropping 30% or more when relevant information sits in the middle of a long context. Even a single piece of conflicting information reduces accuracy.

Failure Compounds in Context

I tell teams this bluntly: once an agent is off track in a chat, it's going to stay off track in that chat. The bad performance is baked into the context window.

This might surprise people. They assume AI agents learn from feedback the way a colleague would. Tell a person "you're doing that wrong, do it this way instead," and they adjust. They have persistent memory. They genuinely course-correct. They may save memories for later chats that correct this behavior, but this session is permanently tainted.

AI agents don't work that way. They process the entire conversation as a single block of text every time they generate a response. A conversation full of corrections reads like a conversation full of confusion. The signal-to-noise ratio degrades with every failed attempt.

The agent isn't ignoring your feedback. It's weighting your feedback against the entire history of the conversation, including all the wrong outputs that preceded it. A Microsoft and Salesforce study simulating over 200,000 conversations across 15 leading LLMs found that multi-turn performance dropped an average of 39% compared to single-turn, with even the most capable models showing 30-40% degradation. Their core finding: when LLMs take a wrong turn in a conversation, they get lost and do not recover.

The Right Pattern: Diagnose, Then Reset

When an agent goes off track, here's what actually works:

Stop the task immediately. Don't let it keep trying. Don't feed it more corrections while it's still attempting the work.

Ask it to diagnose the problem. This is where the current conversation is still valuable. Say "stop, tell me why you're struggling with this." The agent can often articulate what it's confused about, what constraints are conflicting, or what part of your instructions is ambiguous. That diagnostic output is genuinely useful.

Take the feedback and start a new session. This is the critical step. Don't say "great, now try again." Take what you learned from the diagnostic conversation and open a completely fresh chat. New session. Clean context window.

In that new session, incorporate the lesson. If the agent was confused about step four because your instructions were vague, write clearer instructions. If it was tripping over a conflicting constraint, remove the conflict. Now the agent attempts the full sequence with corrected instructions and zero history of failure.

The results are dramatically better.

Start Loose, Then Tighten

A related principle helps prevent these situations in the first place: don't be overly prescriptive from the start.

When setting up a new agent workflow, describe the task the way you'd describe it to a new employee. Give it your checklist. Give it links to relevant resources. Let it surprise you. The models have come far enough that they'll often figure out a reasonable approach on their own. Understanding where your team sits on the AI adoption ladder helps calibrate how much prescription is appropriate at each stage.

You'll still see the agent struggle in unexpected places. It might handle the complex parts easily and trip over something you thought was simple. Those surprises are valuable. They tell you where your instructions need refinement.

But the refinement happens in the next session, not this one.

The Feedback Loop That Actually Works

The effective cycle looks like this:

  1. Start a session with your current best instructions
  2. Let the agent work until it succeeds or fails
  3. If it fails, diagnose why within that session
  4. Take the diagnosis to a new session with updated instructions
  5. Repeat

This is fundamentally different from the cycle most people fall into: start a session, hit a problem, correct in place, hit another problem, correct again, get frustrated, abandon the whole approach. There are exceptions to this for when one correction will hit on the right answer, then the chat is over, but for any long, complex workflow, it’s better to start fresh.

The first pattern produces steadily improving results across sessions. The second produces degrading results within each session and no improvement across sessions, because the lessons never get incorporated into a clean starting point. This iterative refinement across sessions maps to the same principle behind risk evaluation in AI-aided development: calibrate your process to what the tools actually do well, not what you wish they did.

Make the Reset Easy

One practical tip: make it easy to start fresh. If starting a new session requires re-entering a bunch of context, you'll resist doing it. Set up your projects, templates, or system prompts so that spinning up a new conversation with full context takes seconds, not minutes.

The easier the reset, the more willing you'll be to do it at the first sign of trouble instead of burning twenty minutes trying to salvage a session that's already contaminated. I've written about building this kind of reusable context into your AI workflow before: the goal is making each session self-contained and disposable.

Clean Slate, Better Work

The models are capable and getting better fast. But they have a fundamental architectural property that isn't changing anytime soon: the context window is their entire working reality. Keep it clean, and they'll do good work. Let it fill up with failure and corrections, and no amount of additional correction gets you back on track.

Every significant correction is a signal to start over. Diagnose what went wrong, update your instructions, and begin a clean session. The overhead of starting fresh is trivially small compared to the time wasted wrestling with a contaminated context.

Start fresh. It's always the right call.


Related Content

Next
Next

Why Every AI Workflow Converges on the Same Architecture