A Spark Job Is Pending Due to Insufficient Memory

Issue

Memory is insufficient to submit a Spark job. As a result, the job is in the pending state for a long time or out of memory (OMM) occurs during job running.

Symptom

The job is pending for a long time after being submitted. The following error information is displayed after the job is executed repeatedly:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: 
Aborting TaskSet 3.0 because task 0 (partition 0) cannot run anywhere due to node and executor blacklist. 
Blacklisting behavior can be configured via spark.blacklist.*.

Cause Analysis

The memory is insufficient. As a result, the submitted Spark job is in the pending state for a long time.

Procedure

Log in to the MRS console, click a cluster name on the Active Clusters page and view the node specifications of the cluster on the Nodes tab page.
Add cluster resources owned by the nodemanager process.

MRS Manager:
1. Log in to MRS Manager and choose Services > Yarn > Service Configuration.
2. Set Type to All, and then search for yarn.nodemanager.resource.memory-mb in the search box to view the value of this parameter. You are advised to set the parameter value to 75% to 90% of the total physical memory of nodes.
FusionInsight Manager:
1. Log in to FusionInsight Manager. Choose Cluster > Service > Yarn.
2. Choose Configurations > All Configurations. Search for yarn.nodemanager.resource.memory-mb in the search box and check the parameter value. You are advised to set the parameter value to 75% to 90% of the total physical memory of nodes.
Modify the Spark service configuration.

MRS Manager:
1. Log in to MRS Manager and choose Services > Spark > Service Configuration.
2. Set Type to All, and then search for spark.driver.memory and spark.executor.memory in the search box.
  Set these parameters to a larger or smaller value based on the complexity and memory requirements of the submitted Spark job. (Generally, the values need to be increased.)
FusionInsight Manager:
1. Log in to FusionInsight Manager. Choose Cluster > Service > Spark.
2. Choose Configurations > All Configurations. Search for spark.driver.memory and spark.executor.memory in the search box and increase or decrease the values based on actual requirements. Generally, increase the values based on the complexity and memory of the submitted Spark job.
- If a SparkJDBC job is used, search for SPARK_EXECUTOR_MEMORY and SPARK_DRIVER_MEMORY and modify their values based on the complexity and memory requirements of the submitted Spark job. (Generally, the values need to be increased.)
- If the number of cores needs to be specified, you can search for spark.driver.cores and spark.executor.cores and modify their values.
Scale out the cluster if the preceding requirements still cannot be met because Spark depends on the memory for computing.