Click to copy

Question

How many producers do I need?

Overview

When it comes to producing messages to an Apache Kafka cluster, many questions arise. Some of them are:

  • Should I create a producer per topic, or should I use a single producer for all my topics?
  • What happens if multiple threads using its own instance of a KafkaProducer to write to the same topic?
  • How do I avoid out-of-order writes?
  • We will address all these and find the right balance in this article.

    What is a producer?

    Generally speaking, a producer is an entity that writes new records to topics in an Apache Kafka cluster.

    Any producer can write to any topic or partition at any time. Unlike consumers, producers don't get assigned a particular set of partitions. You could think of them as stateless.

    It looks simple at first, but they encapsulate an essential piece of logic: they are in charge of assigning messages to partitions so that messages with the same key always fall in the same partition. They ensure that messages will be written chronologically ordered within the same partition.

    One producer for all topics and partitions

    Probably the optimal choice for most applications.

    Batching

    The Kafka producer batches any written message before sending them to the Kafka cluster for up to batch.size in bytes or linger.ms in milliseconds, whatever happens first. Additionally, batch.num.messages if you use librd based clients.

    When you use one producer, it can be more efficient when batching messages. It will try to fit all the messages belonging to a particular Kafka node in one network request. This batch will probably contain different messages from different topics and partitions.

    Network connections

    Since you use only one producer, the cluster only needs to keep one connection open per node to that producer. Probably the sweet spot for most applications.

    Using multiple producers would increase the overhead of maintaining TCP connections from every node to every producer, which is something to consider.

    Multiple producers, out of order messages

    Adding more producer instances will lead to out-of-order writing if messages with the same key are simultaneously sent from different instances. Instead, divide your logic into threads that use the same Kafka producer.

    Scaling up

    Kafka producers are thread-safe. If you need better performance, try experimenting with multiple threads sharing one producer. That is the recommended way of scaling up. It will help you fill those batches quicker while keeping your messages in order.