How To Fix - "Container Killed By YARN for Exceeding Memory Limits" in Spark ?

In this post, we will explore How To Fix - "Container Killed By YARN for Exceeding Memory Limits" in AWS EMR Spark. The job log might exhibit the below error message in the terminal.



Reason: Container killed by YARN for exceeding memory limits.

This error strictly means that -

The executor is trying to use some memory.
And that memory is Much More than whatever Memory YARN is allowing it.

Ideally The Total of spark.executor.memory and spark.yarn.executor.memoryOverhead Should be Less Than yarn.nodemanager.resource.memory. This means The sum of -

Memory(driver or executor) Plus
Memory overhead (driver or executor)

Should be Less Than yarn.nodemanager.resource.memory-mb. This above formula should be referred at all times while increasing or decreasing any values in the flags. Based on the data which the Spark job is handling, you might have resort to various options to try out and see which of that helps. Make the changes in spark-defaults.conf for each of the options. Note that -

Almost 90% of spark executor memory is heap memory which is used for the majority of actual workload.
Almost 10% of spark executor memory (Min 384 MB) is overhead memory used for internal works.

Check:

As a primitive check, perform certain basic sanity checks to ensure things are in place with right behaviour.

Check if the dataset is skewed or not. Many a times, actual datasets are skewed. Check this article how to handle skewed data - How To Fix – Data Skewness in Spark (Salting Method)
Are you using the efficient coding process to write the spark code using reduceByKey, groupByKey etc. to appropriately reduce unnecessary data shuffles.

At this point, assuming the dataset & code is fine, proceed with the next steps to handle this error.

Option 1:

In this approach, we will Increase the Memory Overhead which is is the amount of off-heap memory allocated to each executor.
Default is 10% of executor memory or 384, whichever is higher.
Keep increasing memory overhead for the instance. However keep in mind the memory formula explained above.
If the error is with the driver container\executor container, increase memory overhead for the corresponding container only.
Modify spark-defaults.conf two ways -
- In the Main node from /etc/spark/conf/spark-defaults.conf



spark.driver.memoryOverhead <**ADD\_NEW\_VALUE**\>
spark.executor.memoryOverhead <**ADD\_NEW\_VALUE**\>

- Or while submitting the Spark Job -



spark-submit 
--class org.apache.spark.yourSparkApp \\
--master yarn \\
--deploy-mode cluster  \\
--conf spark.driver.memoryOverhead=<**ADD\_NEW\_VALUE**\> \\
--conf spark.executor.memoryOverhead=<**ADD\_NEW\_VALUE**\> \\ 
<Jar\_Name>.jar

Note: If the fix does not help, revert back the changes from spark-defaults.conf to the original values.

Option 2:

In this approach, we will reduce the number of executor cores for the driver or the executor. So that the max tasks limit for the executor decreases.
We can decrease the cores on the respective container based on whether the driver or the executor errored out.
Modify spark-defaults.conf two ways -
- In the Main node from /etc/spark/conf/spark-defaults.conf



spark.driver.cores <**ADD\_NEW\_VALUE**\> 
spark.executor.cores <**ADD\_NEW\_VALUE**\>

- Or while submitting the Spark Job -



spark-submit 
--class org.apache.spark.yourSparkApp \\
--master yarn \\
--deploy-mode cluster \\
----executor-cores <**ADD\_NEW\_VALUE**\> \\
--driver-cores <**ADD\_NEW\_VALUE**\> \\ 
<Jar\_Name>.jar

Note: If the fix does not help, revert back the changes from spark-defaults.conf to the original values.

Option 3:

In this approach, we will increase the number of partitions.
To achieve high parallelism, spark splits the data into smaller chunks - called partitions. Partitions are distributed across different nodes in the spark cluster. Each node can have more than one executor - each of which can execute a task.
Use the below methods to increase the partitions -
- spark.sql.files.maxPartitionBytes - is a parameter to govern the partition size. Default is 128 MB. It can be modified to control the partition size and therefore modify the no. of partitions.
- spark.default.parallelism - is equal to the total number of cores combined for the worker nodes. To increase the no. of partitions, increase this value. Applicable for raw RDDs.
- spark.sql.shuffle.partitions - Try increase it.
- Use .repartition()
You can also use a custom partitioner that uniformly partitions the dataset.
More Partition means Less amount of memory required per partition.

Note: If the fix does not help, revert back the changes from spark-defaults.conf to the original values.

Option 4:

In this approach, we will increase driver and executor memory. But while doing so keep in mind the Formula which we mentioned at the beginning of this article.
If the error occurs in either the driver container, increase memory for the driver.
If the error occurs in either the executor container, increase memory for the executor.
Better to avoid increasing the memory for both.
Modify spark-defaults.conf two ways -
- In the Main node from /etc/spark/conf/spark-defaults.conf



spark.executor.memory <**ADD\_NEW\_VALUE**\> 
spark.driver.memory <**ADD\_NEW\_VALUE**\>

- Or while submitting the Spark Job -



spark-submit 
--class org.apache.spark.yourSparkApp \\
--master yarn \\
--deploy-mode cluster \\
--executor-memory <**ADD\_NEW\_VALUE**\> \\
--driver-memory <**ADD\_NEW\_VALUE**\> \\ 
<Jar\_Name>.jar

Hope this helps to solve the error.

Other Interesting Reads -



Container Killed By YARN for Exceeding Memory Limits ,yarn increase memory limit ,disabling yarn.nodemanager.vmem-check-enabled because of yarn-4714 ,spark.yarn.executor.memoryoverhead value ,spark.yarn.executor.memoryoverhead deprecated ,spark.yarn.executor.memoryoverhead spark-submit ,spark yarn container memory ,yarn kill container ,spark virtual memory ,container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory ,container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits glue ,container killed by yarn for exceeding memory limits pyspark ,container killed by yarn for exceeding memory limits cloudera ,container killed by yarn for exceeding memory limits emr ,reason container killed by yarn for exceeding memory ,executorlostfailure container killed by yarn for exceeding memory limits ,aws container killed by yarn for exceeding memory limits ,yarnallocator container killed by yarn for exceeding memory limits ,warn yarnallocator container killed by yarn for exceeding memory limits ,aws glue container killed by yarn for exceeding memory limits ,aws emr container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits ,disabling yarn.nodemanager.vmem-check-enabled because of yarn-4714 ,spark.yarn.executor.memoryoverhead value ,container killed by yarn for exceeding memory limits. 5.5 gb of 5.5 gb physical memory used ,container killed by yarn for exceeding memory limits. 2.5 gb of 2.5 gb physical memory used ,spark.yarn.executor.memoryoverhead deprecated ,yarn increase memory limit ,spark.yarn.executor.memoryoverhead spark-submit ,spark yarn container memory ,yarn kill container ,spark virtual memory , , ,container killed by yarn for exceeding memory ,container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits in spark ,container killed by yarn for exceeding memory limits spark ,container killed by yarn for exceeding memory limits spark plugs ,container killed by yarn for exceeding memory limits spark.yarn.executor.memoryoverhead ,reason container killed by yarn for exceeding memory limits ,spark error container killed by yarn for exceeding memory limits ,spark reason container killed by yarn for exceeding memory limits , ,Container Killed By YARN for Exceeding Memory Limits ,container killed by yarn for exceeding memory limits glue ,container killed by yarn for exceeding memory limits pyspark ,container killed by yarn for exceeding memory limits cloudera ,container killed by yarn for exceeding memory limits emr , ,reason container killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits spark ,container killed by yarn for exceeding memory ,container killed by yarn for exceeding memory limits spark.yarn.executor.memoryoverhead ,spark error container killed by yarn for exceeding memory limits ,container was killed by the yarn framework for exceeding memory limits ,container killed by yarn for exceeding memory limits. 5.5 gb of 5.5 gb physical memory used ,container killed by yarn for exceeding memory limits. 2.5 gb of 2.5 gb physical memory used ,container killed by yarn for exceeding memory limits. 1.4 gb of 1.4 gb physical memory used ,container killed by yarn for exceeding memory limits. 2.4 gb of 2.4 gb physical memory used ,container killed by yarn for exceeding memory limits in spark ,lost executor container killed by yarn for exceeding memory limits ,killed by yarn for exceeding memory limits ,container killed by yarn for exceeding memory limits. physical memory used ,container killed by yarn for exceeding physical memory limits ,spark reason container killed by yarn for exceeding memory limits

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How To Fix - "Container Killed By YARN for Exceeding Memory Limits" in Spark ?

Check:

Option 1:

Option 2:

Option 3:

Option 4:

Other Interesting Reads -

How to Send Large Messages in Kafka ?

Fix Spark Error – “org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0”

How to Handle Bad or Corrupt records in Apache Spark ?

How to use Broadcast Variable in Spark ?

How to log an error in Python ?

How to Code Custom Exception Handling in Python ?

How to Handle Errors and Exceptions in Python ?

How To Fix – “Ssl: Certificate_Verify_Failed” Error in Python ?

How to Send Large Messages in Kafka ?

Fix Spark Error – “org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0”

How to Handle Bad or Corrupt records in Apache Spark ?

How to use Broadcast Variable in Spark ?

Best Practices for Dependency Problem in Spark

Sample Code – Spark Structured Streaming vs Spark Streaming

Sample Code for PySpark Cassandra Application

How to Enable UTF-8 in Python ?

How to log an error in Python ?

Sample Python Code To Read & Write Various File Formats (JSON, XML, CSV, Text)

How to Handle Errors and Exceptions in Python ?

How to Handle Bad or Corrupt records in Apache Spark ?

How To Fix – Partitions Being Revoked and Reassigned issue in Kafka ?

What Are The Most Important Metrics to Monitor in Kafka ?

How To Connect Local Python to Kafka on AWS EC2 ?

How to Send Large Messages in Kafka ?

Fix Spark Error – “org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0”

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How To Fix - "Container Killed By YARN for Exceeding Memory Limits" in Spark ?

Check:

Option 1:

Option 2:

Option 3:

Option 4:

Other Interesting Reads -

Popular Articles

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)