The Three Pillars of Scalable Data Processing
Every unit of work in a data processing system should aspire to be small, independently processable, and consistently sized. When these three properties hold, scaling becomes almost trivially simple. Reality rarely cooperates, which is why understanding these properties matters so much for platform engineering.
The Ideal State: Independent, Small, and Consistent
These three properties represent the pinnacle of what we do as platform engineers. Every message flowing through a system, every unit of work we design, should aspire to be:
Small: Quick to process, minimal memory footprint, and fast to retry if something goes wrong. Yes, smaller batches introduce per-message overhead, but predictability gains typically outweigh coordination costs.
Independently processable: No coordination required with other units of work. I can process this message without knowing anything about any other message in the system.
Consistently sized: No surprises. Every unit of work takes roughly the same amount of time and resources, with no landmines lurking in the queue.
When these three properties hold, scaling becomes almost simple. See lag growing on a Kafka topic? Double the consumer count. Since everything is independently consumable, small, and consistently sized, you've just doubled your throughput. The math works because the work itself cooperates with horizontal scaling. LinkedIn's Kafka benchmarks demonstrate this principle: properly partitioned message streams achieved 2.6 million records per second with three consumers (nearly 3x the throughput of a single consumer), showing near-linear scaling. The catch is when work is indirectly coupled, like all consumers use a shared cache or database.
Where Things Break: Serial Dependencies
Imagine you're processing time-series data where each minute's data depends on the previous minute's state. You can't process this minute until the last minute completes. That's a serial dependency.
Now shrink the window. Instead of minute-level data, you're receiving data every second. Your processing time must now complete in under one second, every single time, or you'll never catch up. You're permanently underwater.
The natural instinct when falling behind is to double the cluster size to increase throughput. But with serial dependencies, this accomplishes almost nothing. You still have to process everything in sequence. Two processors means one works while the other waits for its turn. You've doubled your infrastructure costs while your actual utilization drops by half. This isn't just intuition; it's Amdahl's Law. Even with 90% of your code parallelizable, the remaining 10% serial portion caps your maximum speedup at 10x, regardless of how many processors you add.
This gap between what we want and what we can actually achieve is the core challenge of platform engineering. If you're interested in bridging these operational realities, Working in the Mud explores how teams navigate the space between ideal technical states and pragmatic compromises.
This is the fundamental problem with serial processing: horizontal scaling (the primary tool for handling load) stops working. Adding resources doesn't translate to proportional throughput because the work itself refuses to parallelize.
The Landmine Problem: Inconsistent Sizing
Serial dependencies are bad enough. Combine them with inconsistently sized work, and you've created a system that will eventually choke.
Most messages might process in milliseconds. But scattered throughout the queue are landmines: messages that take 10-100x longer. Maybe they have more complex data structures, more nested relationships, or trigger more expensive downstream operations.
When these landmines hit a serial processing system, throughput goes off a cliff. You're not just briefly delayed; you're blocked for what feels like forever in distributed systems terms. And if you're processing serially, you can't route around the problem. Everything stacks up behind the slow one. Real systems demonstrate this dramatically: one analysis found web requests with average latency under 50ms experienced p99 spikes close to 1 second (a 20x variance between typical and worst-case performance).
This is why consistent sizing matters so much. If I know every message takes roughly the same resources, I can reason about capacity planning. I can predict when I'll fall behind and by how much. Inconsistent sizing removes that predictability entirely.
Smart Partitioning: Finding Independence Dimensions
The solution lies in identifying dimensions along which work is naturally independent. This requires understanding your business domain, not just your technical architecture.
For many SaaS systems, customers represent the obvious partition boundary. Are you doing any cross-customer analytics? Any cross-customer data aggregation? If not, you can process each customer's data independently. This is enormously valuable.
Within a single customer, the question becomes more interesting. Can you independently process across devices? Across sensors? Across feature types? Each dimension where you can maintain independence is a dimension where you can scale horizontally. Whether you use batch or real-time processing for each dimension shapes your architecture significantly; Batch and Real-Time Platforms Have Different Jobs explores how these different workload types have competing requirements for independence.
When you identify a valid independence boundary, you can structure your queues and consumers around it. Ten customers means ten independent processing streams. Falling behind on one customer? Scale that stream while other customers remain unaffected.
But claiming independence requires honesty about your actual dependencies. If processing one customer's data ever requires knowledge of another customer's state, you've lost that independence guarantee. Better to design explicitly for the dependencies you have than to discover them during an incident.
The Resource Utilization Trap
Here's where this gets expensive. Say you have data that requires serial processing within a customer, but you're processing all customers through a single queue. If your queue structure doesn't partition by customer, adding consumers creates contention. Two consumers might try to process the same customer's data, with one winning while the other blocks. Or you implement locking, and now you're coordinating across consumers instead of simply processing work.
The result: doubled infrastructure costs with far less than doubled throughput. This is the real cost of violating independence assumptions. Scaling doesn't work the way you expect, and you pay full price for resources you can't actually use.
Designing for the Ideal
So how do you build systems that maintain small, independent, consistently sized work?
- Question serial dependencies. Every time you find yourself requiring sequential processing, ask whether the business logic truly demands it. Often, serial processing is an accident of implementation rather than a genuine requirement.
- Design partition boundaries early. Don't retrofit independence. Build your data model and queue structure around the dimensions where independence naturally exists. This is architectural work, and your early decisions compound across years. Software Architecture Is a Building provides a framework for thinking about these structural decisions.
- Validate independence claims. Before committing to a partitioning strategy, trace through your actual data flows. Where do cross-partition queries happen? Where might they happen in the future?
- Monitor for landmines. Track processing time distributions, not just averages. When your p99 is 100x your p50, that tells you something important about your sizing consistency.
- Separate fast paths from slow paths. If some work is fundamentally more expensive, route it differently. Don't let expensive operations poison queues full of cheap ones. The Async Decoupling Pattern shows how isolating expensive batch work from real-time systems prevents infrastructure strain.
The Payoff
When you achieve the three pillars, capacity planning becomes predictable. Scaling becomes proportional. Incidents become recoverable. You can look at growing lag and know exactly what lever to pull.
That's the promise of small, independent, consistently sized work. Systems that honor these properties let you scale without surprises, and in platform engineering, predictability is worth everything.

