Why Does a Task Keep Running Even All Nodes in the YARN Resource Pool Are Added to the Blacklist?
Question
Why does Yarn not release the blacklist even all nodes are added to the blacklist?
Answer
In YARN, when the number of application nodes added to the blacklist by ApplicationMaster (AM) reaches a threshold (34% of the total cluster nodes by default), the AM automatically clears the blacklist. In this way, not all available nodes are added to the blacklist and tasks can still obtain node resources.
Assume that there are 8 nodes in a cluster and they are divided in to pool A and pool B by NodeLabel. There are two nodes in pool B. A user submits a task App1 to pool B, but there is not enough HDFS space and App1 fails to run. As a result, two nodes in pool B are added to the blacklist by the AM of App1. According to the preceding principles, 2 is less than the 34% of 8. Therefore, YARN does not clear the blacklist, and App1 cannot obtain resources and keeps running. Even if the node that is added to the blacklist is recovered, App1 still cannot obtain resources.
The preceding principles do not apply to resource pool scenarios. You can change the value of the yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold parameter to (nodes number of the pool/total nodes) x 34% to solve this problem. The parameter is in the Client installation path/Yarn/config/yarn-site.xml file.
The yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold parameter defines the maximum proportion of cluster nodes that can be blacklisted by YARN relative to the total number of nodes in the cluster before node blacklisting is automatically disabled.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot