Alert Fatigue is Better Than Radio Silence (And That's a Problem)
Having too many alerts that drive everyone insane is still better than having no alerts at all. I've complained about alert fatigue plenty of times before, but here's the uncomfortable truth: that statement is completely backwards.
Designing Monitoring Tools for the Job to Be Done
Successful monitoring platforms rest on a fundamental principle that many teams overlook: the format of a page should be determined by who you expect to be there and what job they need to accomplish.
This requires purpose-built interfaces, not configuration layers. Different users come to your monitoring platform with completely different needs, and your page design should reflect those differences from the ground up.
The On-Premises Revenue Trap - Why Enterprise SaaS Deployments Kill Engineering Velocity
Enterprise customers love asking for on-prem deployments. The contract values look irresistible: 2-5x your standard SaaS pricing, multi-year commitments, and the validation that comes with enterprise logos. But having managed hybrid and full on-prem deployments across multiple SaaS platforms, I can tell you the operational reality is a trap that strangles engineering teams.
The numbers tell a stark story: research shows that personnel costs represent 50-85% of total on-prem application ownership, with the vast majority of that time spent on monitoring, maintenance, and troubleshooting rather than innovation.
The Risk Funnel - Why Your Biggest Project Uncertainties Must Come First
Every engineering leader has lived this nightmare: two days from deadline, the team discovers the core architectural assumption doesn't work, the third-party API is missing critical functionality, or the algorithm can't handle production scale. A manageable project suddenly needs another week, a 100% schedule overrun.
This scenario highlights why successful engineering leadership requires systematic approaches across project organization and technical oversight, not just individual heroics.
This isn't bad luck. It's predictable project physics that most teams systematically ignore.
Why Public Communication Just Got Even More Important - The AI Amplification Effect
I've written before about the importance of keeping work discussions in public forums: Slack channels, JIRA tickets, shared docs, anywhere that's searchable and accessible. If it's about work, other people probably need to know about it. I've recommended that teams target 60-80% of their messages in public channels to preserve institutional knowledge and make information searchable for future team members.
With AI tools becoming ubiquitous, this practice has transformed from best practice to competitive necessity.
The Monitoring Trap - Why Build vs Buy Is the Wrong Question
Engineering leadership's most expensive monitoring decision isn't choosing the wrong tool. It's falling into the monitoring trap that costs organizations in wasted engineering time and preventable downtime. The classic "build vs buy" framing is fundamentally broken. It ignores how most teams end up trapped in an expensive middle ground that delivers neither cost efficiency nor operational effectiveness, creating cascading impacts on engineering velocity and business outcomes.
The Two Types of Engineers And How to Optimize for Both
Through managing teams across multiple clients, I've observed that engineering productivity isn't just about technical skills. It's about recognizing that different engineers thrive under different working conditions. Recent research from McKinsey's 2024 software engineering productivity study found that companies implementing tailored management approaches achieved a 20% improvement in employee experience scores, validating the importance of matching management style to individual work preferences.
Why Your Team's Productivity Drops After Every Change
You promote your best engineer to team lead. Three weeks later, productivity has tanked and people are frustrated. Sound familiar?
Here's what most engineering leaders don't realize: this productivity drop is completely normal and predictable. When you promote your best engineer, you're getting hit twice. You lose your best individual contributor while the team figures out how to work together. Understanding the four types of engineering leadership helps explain why this transition is so challenging.
The Upstream Root Cause Problem - Why Your Production Fires Start in Product Requirements
Most teams focus on faster incident response. The real solution is preventing incidents from happening in the first place.
After 10+ years of being continuously on-call across multiple SaaS platforms, I've debugged production incidents, database failures, authentication service outages, and scaling crises. Each time, the immediate focus is the same: restore service, minimize customer impact, conduct a post-mortem. Most teams follow a structured incident response process, which is absolutely necessary for operational stability.
But here's what I've learned that most incident response frameworks miss: your operational pain is usually a symptom, not the disease.
Zero Inbox for AI - Stop Hoarding Chats, Start Building Better
Most people treat AI tools like a digital hoarding situation: dozens of half-finished conversations cluttering their workspace, making it impossible to find anything useful. The solution isn't better chat organization—it's a fundamental shift in how you approach AI collaboration. I delete almost every AI chat I have, and it's made me dramatically more productive. The key is a simple two-category rule: either I'm asking a specific question (delete after getting the answer) or I'm building something using artifacts as staging areas for development (save the result, delete the chat). This isn't about digital minimalism—it's about transforming AI from a conversation partner into a development tool. When you stop having endless discussions and start building tangible outputs, your AI workspace becomes as clean and purposeful as a well-managed inbox, unlocking measurable performance gains in how you work.
Quality In, Quality Out - The Real Driver of AI Output Quality
Every engineering team is racing to implement AI tools, but most are optimizing the wrong variable. They're tweaking prompts and comparing models while ignoring the fundamental truth: your AI output quality is entirely dependent on your input quality. When you ask a default AI model a question, you're getting the average of the internet. Those 1,000 words generated from your 50-word prompt? They're coming from random web content, not your expertise. The companies winning with AI aren't using better models. They're feeding those models better inputs through curated knowledge bases, documented processes, and structured organizational wisdom. This isn't just theory. Research shows that RAG systems pulling from quality knowledge sources increase accuracy by nearly 40% compared to models operating on training data alone. For engineering leaders, this means the competitive advantage isn't the AI tool itself. It's the quality of information you feed it. Start building your knowledge systems now, because input quality isn't just a performance optimization. It's your strategic moat in an AI-commoditized world.
The Four Types of Engineering Leadership Every Growing Team Needs
As engineering teams grow beyond 10-15 people, a predictable pattern emerges: the same leadership challenges surface repeatedly, regardless of industry or technology stack. Teams that understand this pattern scale smoothly. Teams that don't find themselves repeatedly hitting the same bottlenecks despite adding more engineers. Engineering leadership breaks down into four distinct areas, each requiring different skills and focus: People Management, Project Organization, Developer Experience, and Platform Architecture. Most growing teams try to handle all four areas without dedicated focus, creating predictable problems - unreliable deadlines, frustrated developers fighting their tools, team members leaving for better opportunities, and mounting technical debt. When teams recognize these four leadership areas early and plan for them intentionally, they avoid the common trap of overloading one person with responsibilities that require completely different skill sets. The transformation is remarkable - instead of engineering leaders burning out trying to handle everything, you get focused expertise that multiplies the effectiveness of the entire team.
Two-Phase War Games - Scaling Incident Response Training Across Multiple Teams
Traditional war games fall apart when multiple teams try to learn incident response and team dynamics simultaneously. This two-phase approach separates the challenges: homogeneous sessions build incident response skills within existing teams, while heterogeneous sessions focus on cross-team coordination with a shared foundation.
Building Software When Requirements Live Everywhere (And Why That's Actually Fine)
I had this beautiful dream once: perfectly structured JIRA tickets with clear requirements and proper hierarchies. Then I realized fast-moving teams naturally create content everywhere - Slack threads, Google Docs, meeting transcripts, architecture diagrams.
This isn't broken. It's what happens when teams are productive and moving fast. The problem isn't scattered content. It's that we're drowning in our own productivity.
The solution: Feed all that valuable content into AI project knowledge and let anyone ask natural language questions. No more single points of failure. Teams stay productive, and nobody drowns in their own success.
Stop Re-Explaining Context to AI - How Projects Grow Themselves
Every AI conversation starts the same way: explaining context you've already explained dozens of times before. By the time you finish setting the stage, you've burned half your thinking time on background instead of solving the actual problem.
I discovered something counterintuitive: the best AI projects don't just store information, they grow themselves. Each conversation becomes the foundation for better conversations. Instead of starting from scratch, you're building on increasingly sophisticated context. Your AI tools should learn and evolve with your work, not force you to start over every conversation.
The Three Levels of Time Management
Time management lives alongside prioritization and communication as the foundational skills for being an effective team leader. Team leads (alongside everyone else in most organizations) have more work than they can handle, so what work should they do and when? The key to these decisions is understanding the three levels of time management.
How to Read a Burndown Chart
How long do you spend talking about your burndown chart during your sprint retrospectives? Burndown charts tell you so much more than if you finished all the work in the sprint or not. From these charts, you can learn:
How well your team is estimating story points
How many injections occur during the sprint
How well your team is breaking down tickets
Bottlenecks in your team’s software development pipeline
We’ll walk through a few examples of burndown charts in this article and discuss what we can learn.
What Slack Analytics Say About Your Company
Slack Analytics are a powerful tool for understanding the way your company communicates. How and where your team chats is a pillar of your overall team communication structure.
Your product will reflect the positives and negatives of how your company communicates. In this article, you’ll learn how to analyze the single most critical Slack metric and take action to improve how your company communicates.
Good Work, but Not the Right Work
Busy, busy, busy.
In any company (small companies especially), it’s easy for leaders to get busy. Rushing between planning meetings, standups, retros, and one-on-ones, then trying to squeeze in some dev work as well. The work adds up quickly and you can barely catch your breath.
So with all this work, how do you know that you’re making progress? What if you’re just running in place? Let’s take a step back and evaluate the difference between doing the right work and just good work.
Well Qualified vs Uniquely Qualified
When I was a backend team lead I would sometimes jump in and help during sprints by writing code or diving into operations. Occasionally I would even be the best person for the job because I had domain knowledge for that service or sub-system.
So why do I always prioritize dev work dead last on my list of to-dos?

