Cyber Security | DevOps | Cloud | Analytics | Open Source | Programming





How To Fix Spark error - "org.apache.spark.SparkException: Job aborted"



In this post, we will see How to Fix Spark error "org.apache.spark.SparkException: Job aborted". This can be due to various reasons . So I would advise that you do check the below points with respect to your Spark Project.  

  • Make sure the Class Path is correct. Consider the example below . Spark should know where to go and find the Classname (i.e. Classpath location). So in this case the correct Jar location.

./spark-submit 
--class "<CORRECT_CLASSPATH_NAME>" \
--master "spark://xx.yy:7077" \
/AA/BB/target/project-1.0-SNAPSHOT-jar-with-dependencies.jar

  • Version mismatch is one of the Very Common Root cause of all these type of errors. Check the for any mismatch between the spark connector and spark version used in the project. So if Spark version is xx.yy.zz , then the connector version should also correspond to xx.yy.zz. So when you build the Dependency this need to be taken care of. If you are using Scala , you can use the SBT Tool. Otherwise you can use pom.xml to build.
   

  • Make sure you have All the dependencies in place. Download all the dependency Jars and place them in the Jar folder of spark master. It is Good practice to create a Fat Jar. If you skip creating a fat jar, then you have to ensure that the job is submitted with all the correct package specified -

    spark-submit 
    --packages datastax:spark-cassandra-connector:2.4.1-s_2.11 \
    --class "<CORRECT_CLASSPATH_NAME>" \
    --master "spark://xx.yy:7077" 
    /AA/BB/target/project-1.0-SNAPSHOT-jar-with-dependencies.jar
    

 

  • Make sure you are using the Correct IP Address or the Public IP while specifying the Spark Master in case of AWS or any cloud cluster.

 --master spark://<CORRECT_IP_of_SPARK_MASTER>:7077   

OR 

 --master spark://<CORRECT_PUBLIC_DNS_IP_of_SPARK_MASTER>:7077  



  • Make sure to Refresh the metadata if you are using Hive (which internally uses the metastore) . If using Hive, Spark should be Aware of the latest metastore metadata and block location data for the tbale being used. If some new data is loaded into the tables i.e. HDFS\S3 data directory is updated for the table, then we need to use refresh process to take these into account. Use the below command for this

> spark.catalog.refreshTable("<TABLE_NAME>")

  • Check the Availability of Free RAM - whether it matches the expectation of the job being executed. Run below on each of the servers in the cluster and check how much RAM & Space they have in offer.

free -h

  • If you are using any HDFS files in the Spark job , make sure to Specify & Correctly use the HDFS URL. Cross-check that the NameNode is up and running.
 

 

  • If you are getting any NULL Point Exception, there is possibility that you are using operation like Aggregation etc against some Empty data or data which is null. Check that.
 

  • If there is some memory issue with the Job Failure, verify the memory flags and check what value is being set (or default). You might need to tune those. Some of the Important Flags are given below -
    • spark.executor.memory – Size of memory to use for each executor that runs the task.
    • spark.executor.cores – Number of virtual cores.
    • spark.driver.memory – Size of memory to use for the driver.
    • spark.driver.cores – Number of virtual cores to use for the driver.
    • spark.executor.instances ­– Number of executors. Set this parameter unless spark.dynamicAllocation.enabled is set to true.
    • spark.default.parallelism – Default number of partitions in resilient distributed datasets (RDDs) returned by transformations like join, reduceByKey, and parallelize when no partition number is set by the user.
    • To use all the resources available in a cluster, set the maximizeResourceAllocation parameter to true
    • spark.executor.cores
    • spark.executor.memory
    • spark.driver.memory - Use same no. as spark.executors.memory
    • spark.driver.cores - same no. as spark.executors.cores
Hope these help you to solve the - Spark error "org.apache.spark.SparkException: Job aborted"  

Other Interesting Reads :

   


org apache$spark sparkexception job aborted due to stage failure pyspark, job aborted due to stage failure failed 4 times, org apache$spark sparkexception job aborted due to stage failure: lost task , org.apache.spark.sparkexception: task failed while writing rows, job aborted due to stage failure: java lang nullpointerexception, sparklyr job aborted due to stage failure, org.apache.spark.sparkexception: task not serializable, org apache$spark util taskcompletionlistenerexception null, error org.apache.spark.sparkexception job aborted , error org.apache.spark.sparkexception, org.apache.spark.sparkexception error sending message , spark , job aborted, spark error, apache spark error, sparkexception, spark exception