When it comes to producing messages to an Apache Kafka cluster, many questions arise. Some of them are:
We will address all these and find the right balance in this article.
Generally speaking, a producer is an entity that writes new records to topics in an Apache Kafka cluster.
Any producer can write to any topic or partition at any time. Unlike consumers, producers don't get assigned a particular set of partitions. You could think of them as stateless.
It looks simple at first, but they encapsulate an essential piece of logic: they are in charge of assigning messages to partitions so that messages with the same key always fall in the same partition. They ensure that messages will be written chronologically ordered within the same partition.
Probably the optimal choice for most applications.
The Kafka producer batches any written message before sending them to the Kafka cluster for up to
batch.size in bytes or linger.ms in milliseconds, whatever happens first. Additionally,
batch.num.messages if you use
librd based clients.
When you use one producer, it can be more efficient when batching messages. It will try to fit all the messages belonging to a particular Kafka node in one network request. This batch will probably contain different messages from different topics and partitions.
Since you use only one producer, the cluster only needs to keep one connection open per node to that producer. Probably the sweet spot for most applications.
Using multiple producers would increase the overhead of maintaining TCP connections from every node to every producer, which is something to consider.
Multiple producers, out of order messages
Adding more producer instances will lead to out-of-order writing if messages with the same key are simultaneously sent from different instances. Instead, divide your logic into threads that use the same Kafka producer.
Kafka producers are thread-safe. If you need better performance, try experimenting with multiple threads sharing one producer. That is the recommended way of scaling up. It will help you fill those batches quicker while keeping your messages in order.