What If We Could Rebuild Kafka from Scratch?

84 points63 comments8 hours ago
galeaspablo

Agreed. The head of line problem is worth solving for certain use cases.

But today, all streaming systems (or workarounds) with per message key acknowledgements incur O(n^2) costs in either computation, bandwidth, or storage per n messages. This applies to Pulsar for example, which is often used for this feature.

Now, now, this degenerate time/space complexity might not show up every day, but when it does, you’re toast, and you have to wait it out.

My colleagues and I have studied this problem in depth for years, and our conclusion is that a fundamental architectural change is needed to support scalable per message key acknowledgements. Furthermore, the architecture will fundamentally require a sorted index, meaning that any such a queuing / streaming system will process n messages in O (n log n).

We’ve wanted to blog about this for a while, but never found the time. I hope this comment helps out if you’re thinking of relying on per message key acknowledgments; you should expect sporadic outages / delays.

nitwit005

> When producing a record to a topic and then using that record for materializing some derived data view on some downstream data store, there’s no way for the producer to know when it will be able to "see" that downstream update. For certain use cases it would be helpful to be able to guarantee that derived data views have been updated when a produce request gets acknowledged, allowing Kafka to act as a log for a true database with strong read-your-own-writes semantics.

Just don't use Kafka.

Write to the downstream datastore directly. Then you know your data is committed and you have a database to query.

show comments
vim-guru

https://nats.io is easier to use than Kafka and already solves several of the points in this post I believe, like removing partitions, supporting key-based streams, and having flexible topic hierarchies.

show comments
Ozzie_osman

I feel like everyone's journey with Kafka ends up being pretty similar. Initially, you think "oh, an append-only log that can scale, brilliant and simple" then you try it out and realize it is far, far, from being simple.

show comments
tyingq

He mentions Automq right in the opener. And if I follow the link, they pitch it in a way that sounds very "too good to be true".

Anyone here have some real world experience with it?

peanut-walrus

Object storage for Kafka? Wouldn't this 10x the latency and cost?

I feel like Kafka is a victim of it's own success, it's excellent for what it was designed, but since the design is simple and elegant, people have been using it for all sorts of things for which it was not designed. And well, of course it's not perfect for these use cases.

show comments
rvz

Every time another startup falls for the Java + Kafka arguments, it keeps the AWS consultants happier.

Fast forward into 2025, there are many performant, efficient and less complex alternatives to Kafka that save you money, instead of burning millions in operational costs "to scale".

Unless you are at a hundred million dollar revenue company, choosing Kafka in 2025 is doesn't make sense anymore.

vermon

Interesting, if partitioning is not a useful concept of Kafka, what are some of the better alternatives for controlling consumer concurrency?

show comments
frklem

"Faced with such a marked defensive negative attitude on the part of a biased culture, men who have knowledge of technical objects and appreciate their significance try to justify their judgment by giving to the technical object the only status that today has any stability apart from that granted to aesthetic objects, the status of something sacred. This, of course, gives rise to an intemperate technicism that is nothing other than idolatry of the machine and, through such idolatry, by way of identification, it leads to a technocratic yearning for unconditional power. The desire for power confirms the machine as a way to supremacy and makes of it the modern philtre (love-potion)." Gilbert Simondon, On the mode of existence of technical objects.

This is exactly what I interpret from these kind of articles: engineering just for the cause of engineering. I am not saying we should not investigate on how to improve our engineered artifacts, or that we should not improve them. But I see a generalized lack of reflection on why we should do it, and I think it is related to a detachment from the domains we create software for. The article suggests uses of the technology that come from so different ways of using it, that it looses coherence as a technical item.

show comments
fintler

Keep an eye out for Northguard. It's the name of LinkedIn's rewrite of Kafka that was announced at a stream processing meetup about a week ago.

supermatt

> "Do away with partitions"

> "Key-level streams (... of events)"

When you are leaning on the storage backend for physical partitioning (as per the cloud example, where they would literally partition based on keys), doesnt this effectively just boil down to renaming partitions to keys, and keys to events?

show comments
olavgg

How many of the Apache Kafka issues are adressed by switching to Apache Pulsar?

I skipped learning Kafka, and jumped right into Pulsar. It works great for our use case. No complaints. But I wonder why so few use it?

0x445442

How about logging the logs so I can shell into the server to search the messages.

elvircrn

Surprised there's no mention of Redpanda here.

show comments
mgaunard

I can't count the number of bad message queues and buses I've seen in my career.

While it would be useful to just blame Kafka for being bad technology it seems many other people get it wrong, too.

lewdwig

Ah the siren call of the ground-up rewrite. I didn’t know how deep the assumption of hard disks underpinning everything is baked into its design.

But don’t public cloud providers already all have cloud-native event sourcing? If that’s what you need, just use that instead of Kafka.

Spivak

Once you start asking to query the log by keys, multi-tenancy trees of topics, synchronous commits-ish, and schemas aren't we just in normal db territory where the kafka log becomes the query log. I think you need to go backwards and be like what is the feature a rdbms/nosql db can't do and go from there. Because the wishlist is looking like CQRS with the front queue being durable but events removed once persisted in the backing db where the clients query events from the db.

The backing db in this wishlist would be something in the vein of Aurora to achieve the storage compute split.

imcritic

Since we are dreaming - add ETL there as well!

hardwaresofton

See also: Warpstream, which was so good it got acquired by Confluent.

Feels like there is another squeeze in that idea if someone “just” took all their docs and replicated the feature set. But maybe that’s what S2 is already aiming at.

Wonder how long warpstream docs, marketing materials and useful blogs will stay up.

ghuntley

I'm going to get downvoted for this, but you can literally rebuild Kafka via AI right now in record time using the steps detailed at https://ghuntley.com/z80.

I'm currently building a full workload scheduler/orchestrator. I'm sick of Kubernetes. The world needs better -> https://x.com/GeoffreyHuntley/status/1915677858867105862

YetAnotherNick

I wish there is a global file system with node local disks, which has rule driven affinity to nodes for data. We have two extremes, one like EFS or S3 express which doesn't have any affinity to the processing system, and other what Kafka etc is doing where they have tightly integrated logic for this which makes systems more complicated.

show comments
Mistletoe

I know it’s not what the article is about but I really wish we could rebuild Franz Kafka and hear what he thought about the tech dystopia we are in.

>I cannot make you understand. I cannot make anyone understand what is happening inside me. I cannot even explain it to myself. -Franz Kafka, The Metamorphosis

show comments