Claude Code as an Operational Partner for DevOps

Mar 11

People are building incredible things with AI coding tools. Custom applications, full-stack prototypes, automation pipelines. But there's a quieter, equally powerful use case: using Claude Code as an operational partner.

I've written before about using Claude Code for personalized software, and I still think that's a great use case. But some of the highest-value work I do with it never results in committed code at all.

DevOps Work Is Half Investigation

Here's a reality about DevOps that people outside the discipline don't appreciate: most of the work doesn't result in checked-in code. Research from Garden.io found that engineers spend over 15 hours per week on non-coding tasks like maintaining tooling, debugging pipelines, and setting up environments, compared to just 4.5 hours writing application code. Kubernetes operations, Argo deployments, metric crawling, cloud bill investigation, diagnostic scripts, environment analysis. Small tasks need to get done, discussed, and iterated on.

Some of this work does result in code changes. Helm chart modifications, Argo configurations, infrastructure-as-code updates. That's great because it lives in a repo. But much of DevOps work is investigative. The work requires analysis, scripts, and data, but the end result is a decision or an insight, not a pull request.

Or consider another pattern: analysis that, combined with knowledge of the codebase, leads to very targeted code changes. The investigation itself is the heavy lift. The resulting code change might be a single line.

This is exactly where Claude Code excels as a partner.

Read-Only Access Changes Everything

The biggest unlock for operational AI partnership is giving Claude Code read-only access to diagnostic information. Whether you're using AWS CloudWatch, GCP monitoring, Prometheus, or similar tools, connecting Claude to those data sources lets it investigate independently. Building monitoring tools that serve the operator's actual workflow is critical, which is why monitoring platform design matters.

The goal is building larger closed feedback loops. Every additional read-only data source you connect expands the scope of what Claude can investigate without you manually copying and pasting results.

When Claude can query metrics directly, it can form a hypothesis, check the data, refine the hypothesis, and check again. That loop, which would otherwise require you to run each query and relay the results, happens automatically.

Note that you must consider what PII and types of access you are giving Claude before providing prod diagnostic data. Doing this safely (read only) and securely (understanding the data in those logs and metrics) is critical.

The Partner Model, Not the Autopilot Model

The boundary matters in how I use this. I don't give Claude direct access to Kubernetes. Instead, we work together in one of two patterns.

The first pattern is script generation. Claude writes analysis scripts, I review them, I execute them, and I give Claude the results. This keeps me in the loop on what's running in my environment while letting Claude handle the tedious parts of writing and iterating on those scripts.

The second pattern is command generation. Claude generates kubectl commands or diagnostic queries, I run them directly, and I provide the results back. This is particularly useful for complex commands with multiple flags or piped operations that I'd otherwise spend time looking up or composing manually.

Both patterns maintain human oversight while dramatically accelerating the pace of investigation. Claude handles the generation and analysis. I handle the execution and judgment calls.

Giving read-only access through commands like this is difficult. That's why I opt for script generation.

Practical Applications

Production metric analysis: Checking metrics before and after a performance improvement used to mean pulling up dashboards, comparing time ranges, and eyeballing trends. With Claude as a partner, I can describe what I'm looking for and have it build queries that systematically compare the relevant metrics. The key is focusing on the right signals rather than getting lost in dashboard proliferation.

Cloud bill investigation: This remains one of the highest-value applications. According to Flexera's 2025 State of the Cloud Report, 84% of organizations say managing cloud spend is their top cloud challenge, with budgets already exceeding limits by 17%. Cloud cost analysis requires correlating spend data with architectural context. Claude can process cost reports, cross-reference them with what it knows about the infrastructure, and surface anomalies that would take hours to find manually.

Kubernetes diagnostics: A Rafay Systems survey of over 2,000 professionals found that 93% of enterprise platform teams face persistent challenges managing Kubernetes complexity and costs. Whether it's debugging pod scheduling issues, analyzing resource utilization patterns, or investigating network policies, Claude can help generate the right commands and interpret the output. The back-and-forth of "here's what I see, what should I check next" becomes a structured diagnostic conversation. Well-designed alerts reduce the ad-hoc investigation needed, though even imperfect alerting beats reactive hunting.

Each of these tasks involves real investigation. There's no single command that gives you the answer. It's iterative, context-dependent work that benefits enormously from having a partner who can hold the full picture in memory while you work through it together.

The Broader Point

What I find interesting is how many of the highest-value AI coding applications aren't about writing application code at all.

Operational partnership is yet another non-code way to use AI effectively. Not building features. Not writing production code. Just doing operational work more effectively with a partner that can hold context, generate scripts, analyze data, and iterate alongside you.

If you're only using AI coding tools to write code, you're missing the larger opportunity. Google's 2024 DORA report found that AI adoption in code generation actually correlated with a 7.2% decrease in delivery stability, largely because AI makes it easy to produce larger, riskier changesets. The operational partnership model sidesteps this problem entirely. When AI handles investigation and generation while you handle execution and judgment, the value comes from better analysis and faster iteration, not from shipping more code.

Getting Started

If you want to try this approach, here's where to begin:

Name your threads. Start naming Claude Code threads by ticket number or operational task. The ability to resume context is the foundation of the partnership model.
Connect read-only data sources. Give Claude access to your monitoring and observability tools. Even one additional data source creates a feedback loop that multiplies its usefulness.
Start with investigation tasks. Pick an operational task that requires analysis rather than code changes. Cloud cost review, metric comparison, or diagnostic investigation are all good starting points.
Keep execution in your hands. Review scripts before running them. Run commands yourself and provide results. The value is in the generation and analysis, not in autonomous execution.

The more closed feedback loops you create, the more effective the partnership becomes. Start small, expand the loops, and you'll quickly find that your operational work moves faster with a partner that never loses context.

Related Content

Featured

June 9, 2026

Your Laptop Is Just a Portal

June 9, 2026

My laptop is a four-year-old Dell XPS 15 with 16 gigs of RAM. Fine for normal work. Not fine for running Windows, WSL, a real codebase, a Claude session, and a browser at the same time. It came to a head over Thanksgiving last year, when I was accidentally on the road for three weeks and couldn't get serious work done. WSL on 16 gigs just exploded.

The first fix was offloading development to an EC2 instance. That worked, but the monthly bill kept climbing and the hardware was still anemic for what I actually needed. So I bought a remote dev box for the home lab and moved everything off the EC2.

That's the boring origin story. The interesting part is what the setup unlocked.

June 9, 2026

June 2, 2026

Tickets Are the New Prompts

June 2, 2026

I haven't written a Linear ticket by hand in six months. I don’t write the majority of my Claude prompts. The two stopped being separate things. The ticket is the prompt.

June 2, 2026

May 26, 2026

The Amdahl's Law Problem in AI-Assisted Development

May 26, 2026

AI did not make the whole software delivery system faster.

It made one stage louder.

That is the part missing from most productivity conversations right now. A developer gets a coding assistant, the coding step accelerates, and everyone acts like the entire SDLC should accelerate by the same amount. Then review queues grow. Test failures pile up. Deployment gets riskier. Senior engineers spend more of their day reconstructing intent from code that looks plausible but does not quite match the system.

That is not a paradox. That is Amdahl's Law doing exactly what Amdahl's Law does.

Speed up one stage in a constrained system, and the bottleneck moves.

May 26, 2026

May 19, 2026

Concentric Feedback Loops: How AI Agent Teams Actually Ship Code

May 19, 2026

I've been rebuilding one of my Claude Code workflows because the old version was too linear.

That sounds like a small implementation detail. It isn't. It points at the part of AI-assisted development that most teams are about to run into: once agents can do real work for hours, strict phase gates start getting in the way of the feedback loops that make the work safe.

The normal development cycle is familiar: requirements, plan, plan review, implementation, tests, peer review, more implementation, more tests, security review, architecture review, integration testing, end-to-end testing. We pretend this is a clean sequence because it is easier to write down that way.

It has never been that clean.

The work has always been loops. AI agent teams just make the loops visible.

May 19, 2026

May 12, 2026

Your Team's AI Metrics Are Lying to You

May 12, 2026

Your engineering team adopted AI coding tools six months ago. Deployment frequency is up. Lead time is down. PRs are flying through the pipeline. Everyone feels faster.

But are they?

I've been digging into the data across multiple client engagements, and there's a growing gap between what AI-assisted engineering teams perceive and what the numbers actually show. The metrics most teams celebrate are painting an incomplete picture, and the metrics that would tell the real story are the ones nobody's watching.

May 12, 2026

May 5, 2026

Start Fresh - Why Fixing AI Agents Mid-Chat Never Works

May 5, 2026

You're four steps into an AI agent workflow. Steps one through three went perfectly. Step four goes sideways. So you start correcting. "No, do it this way." "Try again." "No, like this." Five corrections later, the output is worse than when the problem first appeared.

The instinct to fix things in place is deeply human. It's also exactly wrong when working with AI agents.

May 5, 2026

April 28, 2026

Why Every AI Workflow Converges on the Same Architecture

April 28, 2026

Three AI agents. Three different problem contexts. Each time, the solution emerged with the same architecture.

The first was my own operational agent. A personal partner for research, drafting, and scheduling. The second was a marketing content bot I helped a client team build. The third was an analytics workflow for another team. Different domains, different users, different stakeholders. But when I stepped back and compared the three designs, the structural similarity was impossible to ignore.

I didn't plan it. I wasn't working from a blueprint. I was solving three different problems and each time, I ended up reaching for the same three layers: an immutable identity, compiled learnings, and a human approval gate.

One builder reaching for the same shape across three contexts isn't proof of a universal law. But the fact that I keep reaching for it without trying to is worth sitting with. Every production AI workflow I've built that survives contact with reality seems to pull in this direction. Not because anyone prescribed it. Because the problems keep forcing it.

April 28, 2026

April 21, 2026

The SDLC is Rediscovering Itself

April 21, 2026

AI is forcing software development back to first principles. The practices most teams abandoned as overhead, specs, formal verification, architectural review gates, are becoming essential again the moment humans stop reading every line of code.

I've watched this play out across my own work this year. The discipline I used to skip because it slowed me down is suddenly the only thing standing between a working system and a pile of plausible-looking garbage. The SDLC didn't die. It got hollowed out, and now it's being rebuilt in place, one abandoned practice at a time.

April 21, 2026

April 15, 2026

The Intent Gap

April 15, 2026

Your AI-generated code is degrading, and the degradation isn't a tooling problem. It's a translation problem, and every step of the chain is lossy.

I've fought against this degradation. I swap models. I rerun implementation prompts. The damage happens before the first line of code gets generated, in a chain of translations that no refactoring pass can reverse. This is a different problem than the one I wrote about in Risk Evaluation in the Age of AI-Aided Development, which is about deciding when AI acceleration is worth the technical debt. The Intent Gap is upstream of that decision.

I call the thing at the center of this the Intent Gap: the distance between what you meant and what the AI produced. The Gap is where everything fails. And once you see it, you can't unsee it.

April 15, 2026

April 7, 2026

Tests as Ceremony: When AI Breaks the Safety Net

April 7, 2026

AI-generated tests pass. That's the problem.

Passing is not a useful correctness criterion. Mark Seemann makes this argument sharply: AI-generated tests have "little epistemological content." They skip the critical step of seeing a test fail before writing code. The test exists, the coverage number goes up, and everyone moves on. But the test never proved anything. It never caught a bug, because it was never designed to catch one.

April 7, 2026

aiautomationdevops

Brian Conn https://connsulting.io

Claude Code as an Operational Partner for DevOps

DevOps Work Is Half Investigation

Read-Only Access Changes Everything

The Partner Model, Not the Autopilot Model

Practical Applications

The Broader Point

Getting Started

Related Content

Connsulting

About

Offerings

Claude Code as an Operational Partner for DevOps

DevOps Work Is Half Investigation

Read-Only Access Changes Everything

The Partner Model, Not the Autopilot Model

Practical Applications

The Broader Point

Getting Started

Related Content

From Two Minutes to Ten Seconds - The ROI of Personalized Software

The AI Adoption Ladder - A Practical Framework for Engineering Teams

Connsulting

About

Offerings