How To Monitor Important Performance Metrics in Kafka ?

In this Post , we will learn What Are The Most Important Metrics to Monitor in Kafka and How To Monitor Important Performance Metrics in Kafka ? Kafka monitoring is a Crucial Part of the Process. Since Kafka is Big and Complex in Architecture , when Something goes down , it is a head-scratching task for the Developers to find out the root cause. Having a handy list of metrics to monitor at the First hand helps in this regard.

However , since Kafka has a pretty large list of flags and variables working under the hood , it is challenging to look at them all. This makes a List of Primary Metrics even more Productive. Also it is a Good Practice to keep Monitoring these Metrics occasionally to ensure the Health of the Kafka System is Good. Hence , we have compiled the below list of Metrics which are Primarily Important to be under the Radar at all times in a Kafka System. We will see what are the important metrics with respect to Producer , Broker and Consumer in the Kafka ecosystem.  

1. Metrics for Producer:

Kafka producers are not close-knit part of the Kafka ecosystem . But nonetheless certain metrics related to Producers needs to be monitored as producers has to keep publishing data to the broker(s).

  • Rate of Response from Brokers - Producers can get three types of responses from the brokers based on the data received (by the brokers). The scenarios can be -
    • Message received but not committed - request.required.acks == 0
    • Message received and committed(written) at least once (by replica)  request.required.acks == 1
    • Message received and committed(written) by all the replicas request.required.acks == all
So Based on the type of commitment principle used , there could be low response rate from brokers.


  • Request Rate - This rate defines the speed at which producers send data to brokers. This should be in tandem with the broker's digestion speed to ensure data is been committed.

  • Batch Size - It is efficient to group bunch of messages as a batch and then to send. Default batch size is 16KB. If batch size quota is full and time to send a batch) is reached , batch of message is sent.


2. Metrics for Broker:

Below are some of the important metrics with respect to the Kafka Broker. Some of the metrics are available through JMX.

  • Number of Active Controller - ONLY ONE PER CLUSTER should be Active Controller. It helps to Select Kafka Leader , Consumer Group Assignment etc. Use the Zookeeper Shell to find out Who Active Controller is .

$ ./bin/ :2181 get /controller

  • Max Size of the Request to Broker  - The maximum size of any request sent in the window for a broker.


  • Average Size of the Request to Broker  - The average size of all requests in the window for a broker. Compare this to the above Flag to keep track of sizes.


  • Average Count of Requests to the Broker - The average number of requests sent per second which the broker is handling.


  • Average Response from the Broker - This is the Average count of responses received per second from the broker.


  • Number of Under-Replicated Partitions - Ideally should Always be ZERO. This ensures the Replication process is not getting lagged.


  • Number of Offline Partitions - Should Always be ZERO. A Non-Zero number means Partition is down and hence means your topic might be unavailable.


  • Total Broker Partitions - How Many Partitions a Broker is Managing. Keep it Balanced

PartitionCount –----> Number of partitions on this broker

  • Under Minimum ISR Partition Count - Number of partitions whose in-sync replicas count is less than Minimum ISR .


  • Number of partitions on the broker - This should be Even , as far as possible, across all brokers.


  • Lag in number of messages per follower replica - This helps to understand if the replica is slow or has stopped replicating from the leader.


  • Active Connections - The current number of active connections for Producer



  • In-Sync-Replicas -   Ideally the count of  in-sync replicas (ISRs) in case of a particular partition stays mostly static. However if you are expanding the Kafka cluster or Deleting partitions in such cases the ISR number would change.


  • Total Time To Service a Request - This metric measures how much time is taken by the broker to serve a request in terms of requesting Producers to send data or requesting consumers to fetch new data or inter-broker request with regard to new data. This value should not change for most of the times . However if it changes rapidly , it obviously would render a slow-down in the request serving process. And therefore it is good idea to cross-check queue, local , remote and response values as this metric is the sum total of these four metrics.


3. Metrics for Consumer:


  • Bytes Per Sec - This sets the Average number of bytes consumed per second for a specific topic or across all topics.


  • Records Per sec - This defines the Average number of records consumed per second for a specific topic or across all topics


  • Fetch Request Count - This defines how many times fetch requests are coming per second from the consumer


4. Performance Measuring Tools:

Apache Kafka has some out-of-the-box performance testing tools. These are available as

  • bin/

When you run the below command , it will show the metrics with respect to producing a total of messages(5000 in this example). It will also show the following details -

  • The start Time
  • The end time
  • Compression
  • Message size
  • Batch Size
  • Total data sent in MB (for the total 5000 messages in this case)
  • MB/sec,
  • Total number of messages sent (5000 for this example)
  • Total time is second for producing the 5000 messages

bin/ --broker-list localhost:9092 --messages 5000 --topic <TOPIC\_NAME> 
--broker-list <LIST OF BROKERS> --producer.config --print-metrics

  • bin/kafka-consumer-perf-test

  When you run the below command , it will show the metrics with respect to consuming a total of messages(5000 in this example). It will also show the following details -

  • The start Time
  • The end time
  • Compression
  • Message size
  • Batch Size
  • Total data sent in MB (for the total 5000 messages in this case)
  • MB/sec,
  • Total number of messages consumed (5000 for this example)
  • Total time is second for consuming the 5000 messages

bin/ --broker-list localhost:9092 --messages 5000 --topic <TOPIC\_NAME>
--broker-list <LIST OF BROKERS> --consumer.config --print-metrics

These are some of the Metrics that are commonly used. However note that these are not the Be-All-and-End-All . Of course based on the Kafka set-up , there would be other Relevant Metrics to be monitored as well. This was just to give you a quick Handy List. Hope this was useful.  

