Help Center/ MapReduce Service/ User Guide (Kuala Lumpur Region)/ MRS Cluster Component Operation Gudie/ Using Yarn/ Common Issues About Yarn/ Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Updated on 2022-08-12 GMT+08:00

Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?

Question

Why does a new application fail if a NodeManager has been in unhealthy status for 10 minutes?

Answer

When nodeSelectPolicy is set to SEQUENCE and the first NodeManager connected to the ResourceManager is unavailable, the ResourceManager attempts to assign tasks to the same NodeManager in the period specified by yarn.nm.liveness-monitor.expiry-interval-ms.

You can use either of the following methods to avoid the preceding problem:

  • Use another nodeSelectPolicy, for example, RANDOM.
  • Go to the All Configurations page of Yarn by referring to Modifying Cluster Service Configuration Parameters. Search for the following parameters in the search box and modify the following attributes in the yarn-site.xml file:

    yarn.resourcemanager.am-scheduling.node-blacklisting-enabled = true;

    yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold = 0.5.