DevOps | Cloud | Analytics | Open Source | Programming





How To Fix Spark Error - "org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0"



In this post , we will see How to Fix Spark Error - "org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0" . When you run your Spark application or program , at times , you might face the below Spark exception thrown -


org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0

This issue is mostly related to the SparkContext object. Sometimes (due to code specific or other reasons) the SparkContext might shut down or keep restarting(starts & stops many times) which causes this specific error to throw out. Let us see various scenarios and the fixes we can take to handle it.

  • Scenario 1- Make sure to initialize the SparkContext in the Driver code. Let's say the SparkContext is defined in a Singleton class . Note the Singleton class is limited only to a single JVM instance. However since Spark runs in Master-Worker mode and the worker nodes run on different JVM instances . Hence the same Singleton class is also invoked from the worker nodes at the same time. This results in creating separate SparkContext in the worker nodes as well. So to avoid that initialize the SparkContext in the Driver code.
 

  • Scenario 2- Let's say you need to create a new JavaSparkContext due to your application demand or requirement. And you already have a SparkContext craeted. Even in such case also the JavaSparkContext can be created from the existing SparkContext object. So you do not really require to create a separate context.
Check the below example -


SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("SimpleApp");
sparkConf.set("spark.driver.allowMultipleContexts", "true");
sparkConf.setMaster("local");

SparkContext sc = new SparkContext(sparkConf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);

//Create a Java Context
JavaSparkContext.fromSparkContext(sc)

  • Scenario 3- Let's say - You want to create a Streamingcontext from scratch even when the SparkContext is already created . This should be avoided and StreamingContext , if applicable should only be created from an existing Sparkcontext (Similar to the above Scenario of JavaSparkContext).
 

  • Scenario 4- You can also check the "builder" method. It is available to the SparkSession . This method helps to instantiates both the spark and SQL context . Also it is found this way the context conflict possibility is reduced.

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("app").getOrCreate()

 

  • Scenario 5 - If ,as part of the Application, some code is serialized and executed on the worker node and that specific piece of code has SparkContext initialization(by mistake) , then that would cause the application to fail with the same error.
 

  • Scenario 6 - Let's say the Spark application uses Checkpoint concept and somehow the previous run was not successful. The subsequent run might not be able to initialize the broadcast variables from the Checkpointed data. Hence try to delete the checkpoint directories and then start the job.
 

  • Scenario 7 - Check for version compatibility of the components - spark, spark-streaming ,Scala etc. - all as applicable should be of compatible version. It is a good point to re-check this as this simple version incompatibility is the cause lots of issues in Spark application.
  Try the above steps and seeit helps you to rectify the exception.  

Other Interesting Reads -

   


org.apache.spark.sparkexception: failed to get broadcast_3_piece0 of broadcast_3, org apache spark sparkexception corrupt remote block, org apache$spark sparkexception process list, failed to get broadcast_40_piece0 of broadcast_40, failed to get broadcast piece of broadcast, caused by org$apache$spark sparkexception failed to get broadcast, failed to get broadcast_8_piece0 of broadcast_8, broadcast error in spark, java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0, java.io.IOException, org.apache.spark.SparkException, Failed to get broadcast_0_piece0 of broadcast_0, spark error , spark, Apache Spark, Exceptions, Errors, org.apache.spark.sparkexception: failed to get broadcast_3_piece0 of broadcast_3, failed to get broadcast piece of broadcast, failed to get broadcast_8_piece0 of broadcast_8, org apache spark sparkexception corrupt remote block, failed to get broadcast_40_piece0 of broadcast_40, error trying to remove broadcast, failed to fetch remote block broadcast, caused by org$apache$spark sparkexception failed to get broadcast