Why Does a New Application Fail If a NodeManager Has Been in Unhealthy Status for 10 Minutes?
Question
Why does a new application fail if a NodeManager has been in unhealthy status for 10 minutes?
Answer
When nodeSelectPolicy is set to SEQUENCE and the first NodeManager connected to the ResourceManager is unavailable, the ResourceManager attempts to assign tasks to the same NodeManager in the period specified by yarn.nm.liveness-monitor.expiry-interval-ms.
You can use either of the following methods to avoid the preceding problem:
- Use another nodeSelectPolicy, for example, RANDOM.
- Modify the attributes in the yarn-site.xml file by performing the following operations:
- Log in to FusionInsight Manager.
For details about how to log in to FusionInsight Manager, see Accessing MRS FusionInsight Manager.
- Choose Cluster > Services > Yarn > Configurations > All Configurations.
- Search for and modify the following parameters.
Table 1 Parameter configuration Parameter
Description
Example Value
yarn.resourcemanager.am-scheduling.node-blacklisting-enabled
Whether to enable the blacklist mechanism of ApplicationMaster in YARN.
Default value: true
true
yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold
Maximum proportion of cluster nodes that can be blacklisted by YARN relative to the total number of nodes in the cluster before node blacklisting is automatically disabled.
- Default value: 0.34
- Value range: 0 to 1
0.5
- Save the settings. Restart the expired service or instance for the configuration to take effect.
- Log in to FusionInsight Manager.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot