DevOps | Cloud | Analytics | Open Source | Programming





How To Compress Message Size in Kafka ?



In this post , we will see How To Compress Message Size in Kafka . With ever increasing data generating or producing speed and need to retain data for prolonged analysis, it is quite imperative that we understand how we can be smart about data transportation.

Kafka being one of the most widely used Big Data component , we will explore our options to figure out if we can downsize the messages sent in Kafka. High no.of Kafka Partitions and Longer data retention could take a toll on Kafka processing due to the default memory map threshold value for the partitions. It could even result in a memory error during processing or even startup of restart. You can read about the Kafka memory map here - Fix Kafka Error – “Memory Allocation Error”  

  • If we are not dealing with Data which are fast but not very URGENTY-Sensitive (Unlike Financial Transaction Data for example), it could be a good option to Group the messages as a pack or Batch before sending over to Kafka Brokers. It is called Lingering. So the Producer can collect, collate and make a Batch and then data as Topic to Kafka. By default the lingering is set to 0. If you set it to say 100, then the system will Linger up or wait up to 100 ms before sending batch (whether size not met).


\# In Kafka Producer code and configuration , use below as reference - 
ProducerConfiguration.LINGER\_MS\_CONFIG=<USE\_CUSTOM\_VALUE>


  • Along with the lingering , you can also consider the Producer config property called batch.size . By default batch.size is 16K bytes - used by the Producer to batch records. This setting enables fewer requests and allows multiple records to be sent to the same partition.
Note - If a record is larger than the batch size, it will not be batched. Also If the batch size is very very big and you send the batch without filling records fully , the memory allocated for the batch gets wasted.



\# In Kafka Producer code and configuration , use below as reference - 
ProducerConfiguration.BATCH\_SIZE\_CONFIG=<USE\_CUSTOM\_VALUE>


  • Other good option could be a usage of compression algorithm by the Kafka Producer both on the record level as well as batch level. Using a larger batch.size makes compression more efficient.  The producer config has a property called - compression.type. By default it is set to none and hence producer doesn't compress the data. But this setting could be set to compression algorithms like - gzip, snappy, lz4 etc.


\# In Kafka Producer code and configuration , use below as reference - 
ProducerConfiguration.COMPRESSION\_TYPE\_CONFIG="snappy"


  • There is also a concept of End-To-End compression in Kafka through the Kafka Broker config property - "compression.type" if it is set to "producer". This end-to-end is quite productive and system-efficient as compression happens once but is reused by both broker as well as consumer. This relieves the broker off its heavy load. However it is advisable to cross the compatibility of producer , broker & consumer in terms of their performance & efficiency regarding the speed at which they operate and process the data\messages.
In server.properties file



broker.id=0
port=9092
compression.type=producer


  • You could also use Schema Registry and register the schema of the messages or data records. It keeps the schema loosely connected from the data . Any subsequent changes to the schema would need to change only in the Schema Registry.
  Hope this helps.

Other Good Reads -



kafka producer exception handling , kafka error , kafka message size best practice , kafka message compression , kafka message size configuration, kafka message size too large, kafka message size limits, kafka message size defaults , kafka message size config