Engineering Oct 18, 2025 5 min read

Confluent + Databricks: Real-Time Data Platform Patterns

Streaming-first architectures sound clean on a whiteboard. The real-world trade-offs show up when Kafka and the lakehouse have to coexist.

"Real-time" gets used loosely. Before architecting anything, we push every team to define what real-time actually means for their use case. Sub-second? Sub-minute? Within five minutes? The answer changes everything downstream.

When Confluent fits

Confluent — and Kafka more broadly — is the right choice when you have multiple producers, multiple consumers, and need durable replay. If only one team produces and only one team consumes, you don't need Kafka. A queue or direct-write pattern is simpler.

The lakehouse is the analytics tier

Streams flow through Kafka. They land in the lakehouse for analytics. Trying to make Kafka itself the analytical store is a mistake; trying to make the lakehouse the operational message bus is the same mistake from the other direction.

Patterns that hold up

CDC into Kafka, then to Delta: Source-of-truth changes go through Confluent (with schema enforcement); Auto Loader or Delta Live Tables ingests them into the lakehouse.
Spark Structured Streaming for transformations: When transformations need to happen in flight, structured streaming over Kafka topics with Delta sinks is the path of least resistance.
Unity Catalog for governance: Whatever lands in the lakehouse from streaming sources still needs the same governance as batch data. Don't carve out a streaming exception.

Trade-offs to plan for

Streaming pipelines are harder to test, harder to debug, and harder to back-fill than batch. The benefit is fresh data. Make sure the freshness is worth the operational cost — and design for graceful degradation when streaming sources hiccup. They will.

Sample content: This is a placeholder article for layout review. Replace with the real engineering write-up when ready.