DevOps | Cloud | Analytics | Open Source | Programming





How To Fix - Kafka Spark Version Compatible issue



In this post , we will look at fixing Kafka Spark Streaming Scala Python Java Version Compatible issue . Software compatibility is one of the major painpoint while setting up a project which further leads to frequent issues .

We have tried to compile some write-ups as regards to some of the version compatibility followed by some Example Working Combinations which you can try. If you are setting up a new Big Data project involving Spark, Kafka , Spark Streaming , Scala,, Python etc. , we suggest you to go through all the pointers below to note and cross-check your set-ups.

Spark :

  • Spark requires Java 8 ( I have faced problems while using Higher Java versions in terms of software compatibility in the Big data ecosystem).
 

  • Spark 2.3+ has upgraded the internal Kafka Client and deprecated Spark Streaming. It is better to upgrade instead of referring an explicit dependency on kafka-clients, as it is included by spark-sql-kafka dependency.
 

  • Latest Spark Release 3.0 , requires Kafka 0.10 and higher.
 

  • Spark 2.4.4 is pre-built with Scala 2.11.
 

Spark Streaming :

  • spark-streaming-kafka-0-10 needs Kafka 0.10.0 or higher
 

  • spark-streaming-kafka-0-8 needs Kafka 0.8.2.1 or higher
  Download from - https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8-assembly  

Hadoop :

  • Hadoop requires Java 8.
   

Kafka :

  • If you have Kafka + Scala 2.xx , then the Project MUST HAVE Scala 2.xx as well .
 

  • Kafka + Scala
    • Kafka Version 2.6.0 is compatible with - Scala 2.12 or Scala 2.13
    • Kafka Version 2.5.0 compatible with - Scala 2.12 or Scala 2.13
    • Kafka Version 2.4.0 compatible with - Scala 2.11, Scala 2.12 or Scala 2.13
    • Kafka Version 2.3.0 compatible with - Scala 2.11, Scala 2.12
    • Kafka Version 2.0.0 compatible with - Scala 2.11, Scala 2.12
    • Kafka Version 1.0.0 compatible with - Scala 2.11, Scala 2.12
    • Kafka Version 0.10.0.0 compatible with - Scala 2.10, Scala 2.11
 

AWS :

  • If there is requirement , you can add the S3A connector from Hadoop 2.10 (To be put in under tools/lib/) . Jars are -
    • hadoop-aws
    • jets3t
    • woodstox-core
 

Some Working Combinations :

Some "Working" sample Version compatibility dependencies are given below -  

Working Combination 1 :

  • Spark 2.11
  • Kafka v 2.1.3
  • Python 3
  • Additional jars
    • spark-sql-kafka-0-10_2.11-2.4.1.jar
    • kafka-clients-0.10.1.0
  • Spark-submit command for this combo in given below -

./bin/spark-submit \\
--packages org.apache.spark:spark-sql-kafka-0-10\_2.11:2.4.1 \\
--jars spark-sql-kafka-0-10\_2.11-2.4.1.jar,kafka-clients-0.10.1.0.jar,postgresql-42.2.10.jar \\
spark.py \\
localhost:9092 subscribe <TOPIC>


Working Combination 2 :


sparkVersion = "2.1.0"

"org.apache.spark" %% "spark-core" % sparkVersion21,
"org.apache.spark" %% "spark-sql" % sparkVersion21,
"org.apache.spark" %% "spark-hive" % sparkVersion21,
"org.apache.spark" %% "spark-yarn" % sparkVersion21,
"org.apache.kafka" % "kafka-clients" % "2.4.0",
"org.apache.kafka" %% "kafka" % "2.4.0",
"org.apache.kafka" % "kafka-streams" % "2.4.0")



Working Combination 3:

  • Spark 2.2.1  + Scala 2.11  + Kafka 0.10
  Hope this post helps.  

Other Interesting Reads -

 


kafka compatibility, spark structured streaming + kafka, spark-streaming-kafka maven, spark kafka direct stream example, spark streaming-kafka tutorial , spark-kafka consumer, spark-streaming-kafka-0-10, spark-sql-kafka, Kafka Spark Scala Cassandra Compatible Versions,

kafka, apache kafka, Hadoop, spark, spark structured streaming, spark-streaming, Scala, Python, spark streaming kafka, kafka spark, spark structured streaming kafka, kafka and spark, kafka streaming python, spark streaming kafka example, spark streaming, spark structured streaming, spark streaming kafka, spark kafka, structured streaming, apache spark streaming, spark structured streaming kafka, spark streaming kafka example, pyspark streaming, spark readstream, kafka streaming python, scala streaming