Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using YARN/ Common Issues About Yarn/ Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Updated on 2025-08-22 GMT+08:00

Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?

Question

Why does a new application fail if a NodeManager has been in unhealthy status for 10 minutes?

Answer

When nodeSelectPolicy is set to SEQUENCE and the first NodeManager connected to the ResourceManager is unavailable, the ResourceManager attempts to assign tasks to the same NodeManager in the period specified by yarn.nm.liveness-monitor.expiry-interval-ms.

You can use either of the following methods to avoid the preceding problem:

  • Use another nodeSelectPolicy, for example, RANDOM.
  • Modify the attributes in the yarn-site.xml file by performing the following operations:
    1. Log in to FusionInsight Manager.

      For details about how to log in to FusionInsight Manager, see Accessing MRS FusionInsight Manager.

    2. Choose Cluster > Services > Yarn > Configurations > All Configurations.
    3. Search for and modify the following parameters.
      Table 1 Parameter configuration

      Parameter

      Description

      Example Value

      yarn.resourcemanager.am-scheduling.node-blacklisting-enabled

      Whether to enable the blacklist mechanism of ApplicationMaster in YARN.

      Default value: true

      true

      yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold

      Maximum proportion of cluster nodes that can be blacklisted by YARN relative to the total number of nodes in the cluster before node blacklisting is automatically disabled.

      • Default value: 0.34
      • Value range: 0 to 1

      0.5

    4. Save the settings. Restart the expired service or instance for the configuration to take effect.