After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time
Question
During the running of a MapReduce task, active/standby switchover of ResourceManager occurs. After the switchover is complete, the MapReduce task continues to execute, but runs for an excessively long time.
Answer
The ResourceManager HA function has been enabled, but the Work-preserving RM restart function is not enabled.
If the Work-preserving RM restart function is not enabled, the container will be killed during the ResourceManager switchover. As a result, Application Master times out. For details about the Work-preserving RM restart function, visit the following website:
Versions earlier than MRS 3.2.0: http://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html
MRS 3.2.0 or later: https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html
To resolve this issue, perform the following operation:
Set the yarn.resourcemanager.work-preserving-recovery.enabled parameter to true to enable the Work-preserving RM restart function.
yarn.resourcemanager.work-preserving-recovery.enabled=true
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot