Batch and Real-Time Platforms Have Different Jobs

Feb 3

When designing data platforms, I frequently encounter teams trying to build one unified system that handles both real-time streaming and batch analytics. The instinct makes sense: both workloads operate on the same underlying data, so why not share infrastructure?

Getting this architecture right has real consequences.

The challenge is that these workloads have fundamentally different characteristics. Supporting both well on a single platform is expensive and complex. In most cases, you get better results by separating them early and letting each system lean into its strengths.

Independence as an Architectural Lens

When I evaluate a data processing workload, I ask one question: how independent is this work? Independence is the key to horizontal scalability. The more independent the units of work, the easier the system scales.

Several dimensions of independence matter:

Payload independence: Can I process payload A without caring about payload B?
Time independence: Does it matter when I process something, as long as I eventually process it?
Entity independence: Can I process data for entity X without knowing anything about entity Y?
Order independence: Can I process items out of order without breaking correctness?

Workloads that score high on independence scale well horizontally. Workloads with dependencies require coordination, which limits throughput and adds complexity.

Real-Time Workloads Favor Independence

Consider a streaming consumer that receives payloads and writes them to a data warehouse like BigQuery. The job is simple: receive payload, store with correct timestamps, acknowledge, move on.

This workload is highly independent across almost every dimension. Each payload is self-contained. A payload for one IP address has nothing to do with a payload for another. Processing can happen on separate workers without coordination. Even if payloads arrive out of order, it often does not matter because you write them with their original timestamps and let the warehouse handle the rest.

This independence is what makes real-time platforms scalable. You can parallelize across workers, partitions, and regions because each unit of work is isolated.

Where Independence Breaks Down

Not every component enjoys this luxury. Consider maintaining state in a relational database.

Say you track assets with a first_seen timestamp for each IP address. If a payload arrives for an IP at 10:00, and another arrives at 10:10, first_seen should be 10:00. Straightforward.

But what if the payloads arrive out of order? If the 10:10 payload arrives first, you write 10:10 as first_seen. When the 10:00 payload arrives, you have a correctness problem.

You can solve this in the application layer by always flooring to the earlier timestamp. But now you are trading compute for independence. Every write becomes a conditional update instead of a simple insert.

This is the core trade-off: accept dependencies and build for ordered processing, or build for independence and push complexity into the application layer. Neither is wrong, but you have to choose consciously.

Batch Workloads Accept Dependencies

Batch workloads, including ML pipelines, accept dependencies that real-time systems avoid.

An ML model analyzing daily patterns needs all the payloads from that day. It cannot operate on a single payload in isolation. Run the model before all the data has landed, and your results are incomplete. Many models analyze patterns across entities, comparing one IP to others. Time series analysis requires data in order.

Every independence dimension that makes real-time processing scalable becomes a dependency dimension for batch processing.

The saving grace is latency tolerance. Real-time platforms race against incoming data volume. If payloads pile up faster than you can process them, you fall behind.

Batch workloads can afford to be slow. Nobody cares if an anomaly detection model takes 30 minutes to run. This latency tolerance is how batch workloads function despite their dependencies. They can afford the coordination costs because they are not racing the clock.

Leaning Into Strengths

When you have two workloads with opposite characteristics, building one platform to handle both gets expensive. You end up with a system that cannot fully leverage the strengths of either approach.

Your real-time processing cannot scale as horizontally because it keeps accommodating batch dependencies. Your batch workloads cannot get the complete views they need because the infrastructure is optimized for independent parallel processing.

The alternative is two platforms connected by a thin interface.

Split your real-time platform from your batch platform as early as possible in the data flow. Let the real-time platform do what it does well: massively parallel, independent processing of incoming data. Let the batch platform do what it does well: analysis of complete datasets with cross-entity correlation.

The interface between them should be minimal. The real-time platform lands data in a store. The batch platform reads from that store when ready. They communicate through data, not shared processing infrastructure.

Assessing Your Own Workloads

When designing or inheriting a data platform, score each workload against the independence dimensions above. Workloads with similar profiles can share infrastructure. Workloads with opposite profiles should be separated.

The mistake I see teams make is assuming that because two workloads operate on the same data, they should share the same platform. But shared data does not mean shared infrastructure. The workload characteristics matter more than the data source.

The Trade-Off

Building one platform that handles both batch and real-time well is possible. Frameworks like Apache Flink and Spark Structured Streaming have made unified architectures more viable, and some organizations successfully run both workloads on shared infrastructure. It is also expensive and complex. According to Matillion's 2024 Data Readiness Survey, 89% of organizations report issues with their current platform's ability to scale pipelines to meet processing needs, and 70% rate pipeline management as complex.

The complexity tax shows up in how teams spend their time. When your platform tries to serve both workload types, the maintenance burden compounds as you juggle competing optimization requirements.

Separating them early costs you some duplication. You maintain two systems instead of one. But each system can lean into what it does well, and you avoid the ongoing complexity of trying to serve opposite requirements.

For most teams, the separation is worth it.

Related Content

Featured

Feb 17, 2026

The Three Pillars of Scalable Data Processing

Feb 17, 2026

Every unit of work in a data processing system should aspire to be small, independently processable, and consistently sized. When these three properties hold, scaling becomes almost trivially simple. Reality rarely cooperates, which is why understanding these properties matters so much for platform engineering.

Feb 17, 2026

Feb 10, 2026

The Async Decoupling Pattern for Scalable Batch Processing

Feb 10, 2026

Batch processing architecture has a clean pattern that scales elegantly: decouple batch systems asynchronously from everything else. When you get this right, your real-time system stays stable regardless of batch volume, and you never need elaborate job scheduling to avoid infrastructure strain.

Feb 10, 2026

Feb 3, 2026

Batch and Real-Time Platforms Have Different Jobs

Feb 3, 2026

Getting this architecture right has real consequences.

Feb 3, 2026

Jan 28, 2026

Making Interviews Objective with AI (Without Making Them Worse)

Jan 28, 2026

Everyone has opinions about candidates. That's the problem.

We're supposed to ask standard questions, evaluate people against the job description, and test whether they can do the work. Instead, we dig into areas where we think they're weak, ask different questions for each person, and end up testing our biases instead of their abilities.

Jan 28, 2026

Jan 20, 2026

The Software That Shouldn't Exist

Jan 20, 2026

Everyone's worried about AI replacing engineers. The more interesting question is what happens when the cost of building software drops so dramatically that entirely new categories of software become viable.

The industry is calling this "personalized software." Custom tools built for a specific person, a specific context, a specific moment. Software that never leaves your machine. Software that would never justify a product. Software that, until recently, simply wouldn't exist.

Jan 20, 2026

Jan 6, 2026

Shifting Left - How Small Teams Handle Organizational Gaps Without Breaking

Jan 6, 2026

Every small organization has gaps. Maybe you have an engineering lead but no dedicated DevOps team. Maybe your product manager is stretched thin and the tech lead is absorbing PM responsibilities. Maybe a designer role is emerging, but nobody owns it yet.

These gaps often emerge in specific domains. Growing organizations typically need four types of engineering leadership, and early-stage teams almost never have all of them covered. This is normal. The question is: how do you respond?

Teams may make the mistake of dumping the entire burden on one person. They identify the gap, find whoever is closest to it, and expect that person to absorb all the additional work. This breaks people.

There's a better approach I call "shifting left."

Jan 6, 2026

Dec 30, 2025

Working in the Mud - The Mental Model That Keeps Engineering Teams Moving

Dec 30, 2025

Every engineering blog paints a picture of clean microservices, continuous deployment, and comprehensive observability. I've been in this industry for over a decade, and I've never experienced this ideal state across the board. I've seen glimmers. Teams that nail one dimension. But never everything at once.

That gap between the ideal and reality is what I call working in the mud.

Dec 30, 2025

Dec 23, 2025

Software Architecture Is a Building - A Mental Model for Technical Decisions

Dec 23, 2025

Most architecture discussions devolve into abstract debates about microservices, monoliths, and database choices. After years of explaining these concepts to engineers and product leaders, I've found that thinking about software architecture like a physical building cuts through the noise and makes the tradeoffs viscerally clear.

This isn't just a teaching metaphor. It's a decision framework that surfaces why some changes cost weeks and others cost months, why certain tech debt compounds silently while other debt screams at you daily, and how to gauge the right amount of architectural runway to build.

Dec 23, 2025

Dec 16, 2025

AI-Assisted Development Changes What Matters in Framework Selection

Dec 16, 2025

The two-minute deploy is killing my productivity.

That sounds wrong until you think about proportions. Two minutes is nothing. But when AI-assisted development shrinks the time spent writing code, those two-minute deploys start consuming a much larger percentage of your development cycle.

I discovered this while building with a managed backend framework that requires redeployment even during local sandbox development. The frontend rebuilds in seconds. The backend takes two minutes. Suddenly, that backend deploy time is where I spend most of my dev cycle waiting.

A caveat before going further: this observation comes from a greenfield project where I'm moving quickly and iterating frequently. AI-assisted development changes the structure of work in existing projects too, but this effect is most pronounced when building something new and small, where rapid iteration is the default.

Dec 16, 2025

Dec 9, 2025

Stop Fighting the Wrong Battles - The Three-Level Problem Framework

Dec 9, 2025

Most engineering teams waste weeks solving the wrong problems.

They polish user interfaces while core APIs fail. They optimize conversion funnels while databases crash. They redesign onboarding flows while authentication randomly breaks.

This happens because everything gets labeled "high priority" without any systematic way to determine what actually needs fixing first.

Here's a three-level framework that immediately clarifies what to fix first, what can wait, and what's wasting everyone's time.

Dec 9, 2025

architectureengineering

Brian Conn https://connsulting.io

Batch and Real-Time Platforms Have Different Jobs

Independence as an Architectural Lens

Real-Time Workloads Favor Independence

Where Independence Breaks Down

Batch Workloads Accept Dependencies

Leaning Into Strengths

Assessing Your Own Workloads

The Trade-Off

Related Content

Connsulting

About

Offerings