DevOps | Cloud | Analytics | Open Source | Programming





How To Fix - "Container Killed By YARN for Exceeding Memory Limits" in Spark ?



In this post, we will explore How To Fix - "Container Killed By YARN for Exceeding Memory Limits" in AWS EMR Spark. The job log might exhibit the below error message in the terminal.



Reason: Container killed by YARN for exceeding memory limits.


This error strictly means that -

  • The executor is trying to use some memory.
  • And that memory is Much More than whatever Memory YARN is allowing it.
  Ideally The Total of spark.executor.memory and spark.yarn.executor.memoryOverhead Should be Less Than yarn.nodemanager.resource.memory. This means The sum of -

  1. Memory(driver or executor) Plus
  2. Memory overhead (driver or executor)
Should be Less Than yarn.nodemanager.resource.memory-mb. This above formula should be referred at all times while increasing or decreasing any values in the flags. Based on the data which the Spark job is handling, you might have resort to various options to try out and see which of that helps. Make the changes in spark-defaults.conf for each of the options. Note that -

  • Almost 90% of spark executor memory is heap memory which is used for the majority of actual workload.
  • Almost 10% of spark executor memory (Min 384 MB) is overhead memory used for internal works.
 

Check:

As a primitive check, perform certain basic sanity checks to ensure things are in place with right behaviour.

  • Check if the dataset is skewed or not. Many a times, actual datasets are skewed. Check this article how to handle skewed data - How To Fix – Data Skewness in Spark (Salting Method)
  • Are you using the efficient coding process to write the spark code using reduceByKey, groupByKey etc. to appropriately reduce unnecessary data shuffles.
  At this point, assuming the dataset & code is fine, proceed with the next steps to handle this error.  

Option 1:

  • In this approach, we will Increase the Memory Overhead which is is the amount of off-heap memory allocated to each executor.
  • Default is 10% of executor memory or 384, whichever is higher.
  • Keep increasing memory overhead for the instance. However keep in mind the memory formula explained above.
  • If the error is with the driver container\executor container, increase memory overhead for the corresponding container only.
  • Modify spark-defaults.conf two ways -
    • In the Main node from /etc/spark/conf/spark-defaults.conf


spark.driver.memoryOverhead <**ADD\_NEW\_VALUE**\>
spark.executor.memoryOverhead <**ADD\_NEW\_VALUE**\>


    • Or while submitting the Spark Job -


spark-submit 
--class org.apache.spark.yourSparkApp \\
--master yarn \\
--deploy-mode cluster  \\
--conf spark.driver.memoryOverhead=<**ADD\_NEW\_VALUE**\> \\
--conf spark.executor.memoryOverhead=<**ADD\_NEW\_VALUE**\> \\ 
<Jar\_Name>.jar


Note: If the fix does not help, revert back the changes from spark-defaults.conf to the original values.  

Option 2:

  • In this approach, we will reduce the number of executor cores for the driver or the executor. So that the max tasks limit for the executor decreases.
  • We can decrease the cores on the respective container based on whether the driver or the executor errored out.
  • Modify spark-defaults.conf two ways -
    • In the Main node from /etc/spark/conf/spark-defaults.conf
 



spark.driver.cores <**ADD\_NEW\_VALUE**\> 
spark.executor.cores <**ADD\_NEW\_VALUE**\>


 

    • Or while submitting the Spark Job -


spark-submit 
--class org.apache.spark.yourSparkApp \\
--master yarn \\
--deploy-mode cluster \\
----executor-cores <**ADD\_NEW\_VALUE**\> \\
--driver-cores <**ADD\_NEW\_VALUE**\> \\ 
<Jar\_Name>.jar


  Note: If the fix does not help, revert back the changes from spark-defaults.conf to the original values.  

Option 3:

  • In this approach, we will increase the number of partitions.
  • To achieve high parallelism, spark splits the data into smaller chunks - called partitions. Partitions are distributed across different nodes in the spark cluster. Each node can have more than one executor - each of which can execute a task.
  • Use the below methods to increase the partitions -
    • spark.sql.files.maxPartitionBytes - is a parameter to govern the partition size. Default is 128 MB. It can be modified to control the partition size and therefore modify the no. of partitions.
    • spark.default.parallelism - is equal to the total number of cores combined for the worker nodes. To increase the no. of partitions, increase this value. Applicable for raw RDDs.
    • spark.sql.shuffle.partitions - Try increase it.
    • Use .repartition()
  • You can also use a custom partitioner that uniformly partitions the dataset.
  • More Partition means Less amount of memory required per partition.
  Note: If the fix does not help, revert back the changes from spark-defaults.conf to the original values.  

Option 4:

  • In this approach, we will increase driver and executor memory. But while doing so keep in mind the Formula which we mentioned at the beginning of this article.
  • If the error occurs in either the driver container, increase memory for the driver.
  • If the error occurs in either the executor container, increase memory for the executor.
  • Better to avoid increasing the memory for both.
  • Modify spark-defaults.conf two ways -
    • In the Main node from /etc/spark/conf/spark-defaults.conf


spark.executor.memory <**ADD\_NEW\_VALUE**\> 
spark.driver.memory <**ADD\_NEW\_VALUE**\>


    • Or while submitting the Spark Job -


spark-submit 
--class org.apache.spark.yourSparkApp \\
--master yarn \\
--deploy-mode cluster \\
--executor-memory <**ADD\_NEW\_VALUE**\> \\
--driver-memory <**ADD\_NEW\_VALUE**\> \\ 
<Jar\_Name>.jar


  Hope this helps to solve the error.    

Other Interesting Reads -

   



Container Killed By YARN for Exceeding Memory Limits ,yarn increase memory limit ,disabling yarn.nodemanager.vmem-check-enabled because of yarn-4714 ,spark.yarn.executor.memoryoverhead value ,spark.yarn.executor.memoryoverhead deprecated ,spark.yarn.executor.memoryoverhead spark-submit ,spark yarn container memory ,yarn kill container ,spark virtual memory ,container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory ,container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits glue ,container killed by yarn for exceeding memory limits pyspark ,container killed by yarn for exceeding memory limits cloudera ,container killed by yarn for exceeding memory limits emr ,reason container killed by yarn for exceeding memory ,executorlostfailure container killed by yarn for exceeding memory limits ,aws container killed by yarn for exceeding memory limits ,yarnallocator container killed by yarn for exceeding memory limits ,warn yarnallocator container killed by yarn for exceeding memory limits ,aws glue container killed by yarn for exceeding memory limits ,aws emr container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits ,disabling yarn.nodemanager.vmem-check-enabled because of yarn-4714 ,spark.yarn.executor.memoryoverhead value ,container killed by yarn for exceeding memory limits. 5.5 gb of 5.5 gb physical memory used ,container killed by yarn for exceeding memory limits. 2.5 gb of 2.5 gb physical memory used ,spark.yarn.executor.memoryoverhead deprecated ,yarn increase memory limit ,spark.yarn.executor.memoryoverhead spark-submit ,spark yarn container memory ,yarn kill container ,spark virtual memory , , ,container killed by yarn for exceeding memory ,container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits in spark ,container killed by yarn for exceeding memory limits spark ,container killed by yarn for exceeding memory limits spark plugs ,container killed by yarn for exceeding memory limits spark.yarn.executor.memoryoverhead ,reason container killed by yarn for exceeding memory limits ,spark error container killed by yarn for exceeding memory limits ,spark reason container killed by yarn for exceeding memory limits , ,Container Killed By YARN for Exceeding Memory Limits ,container killed by yarn for exceeding memory limits glue ,container killed by yarn for exceeding memory limits pyspark ,container killed by yarn for exceeding memory limits cloudera ,container killed by yarn for exceeding memory limits emr , ,reason container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits spark ,container killed by yarn for exceeding memory ,container killed by yarn for exceeding memory limits spark.yarn.executor.memoryoverhead ,spark error container killed by yarn for exceeding memory limits ,container was killed by the yarn framework for exceeding memory limits ,container killed by yarn for exceeding memory limits. 5.5 gb of 5.5 gb physical memory used ,container killed by yarn for exceeding memory limits. 2.5 gb of 2.5 gb physical memory used ,container killed by yarn for exceeding memory limits. 1.4 gb of 1.4 gb physical memory used ,container killed by yarn for exceeding memory limits. 2.4 gb of 2.4 gb physical memory used ,container killed by yarn for exceeding memory limits in spark ,lost executor container killed by yarn for exceeding memory limits ,killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits. physical memory used ,container killed by yarn for exceeding physical memory limits ,spark reason container killed by yarn for exceeding memory limits