RabbitMQ vs Apache Kafka: An opinionated point of view

What are they?

You can imagine both RabbitMQ and Apache Kafka as queues that can handle a high throughput of data. They are commonly used to communicate among different applications by passing messages. This feature lets you break down the state of one application into a handful of smaller applications that will handle a subset of the original state. They are usually considered when an organisation wants to break a monolith into microservices.

Main difference - persistence

Now, the biggest difference between them both is that Apache Kafka can be used more like a database, where you can persist those messages that hold your applications state indefinitely. On the other hand, once you consume a message in RabbitMQ and acknowledge that it has been processed, it's simply removed from the queue. The state is lost. In Apache Kafka, you can theoretically replay the state of any application at any given point in time.

Where RabbitMQ wins

There is a subset of use cases where RabbitMQ would be desirable.

Let's imagine a queue where one message represents a piece of work that needs to be done. There is also a pool of workers that will be consuming those messages, one by one, so they know which job needs to be done. Once they are done with a job, they acknowledge the queue so that the message can be removed.

In Apache Kafka, you generally need to acknowledge or commit messages in batches. What happens if the processing for one message fails? In RabbitMQ you could just republish that message back to the queue for future processing. In Apache Kafka, since you want to keep a consistent state, pushing twice the same message can be problematic sometimes. You don't want the event OrderPurchased happening twice for the same event. This is why you need extra queues to handle these types of scenarios. See for example the solution that the engineering team at Uber came up with by using a design pattern called Dead Letter Queues.

Horizontal scaling is hard in RabbitMQ

To be added.

Interesting questions

Is it possible for the queue to retain data after a consumer has consumed it?

Apache Kafka yes, every consumer can read at any offset in that queue and they are independent.

RabbitMQ no. Once a consumer started processing a message, it won't be served to others.

Warnings about Apache Kafka

RabbitMQ has been in the industry for much longer, and the talent and tooling available is greater than the one for Apache Kafka.

Running and maintaining RabbitMQ is generally simpler than doing so with Apache Kafka. Unless you go for a cloud solution like Confluent, administering an Apache Kafka cluster is usually more cumbersome and involves much more in-depth DevOps and SysAdmin knowledge.

Development tooling is also more scarce with Apache Kafka, although it's rapidly changing. You should definitely try out our in-house solution.