Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using MapReduce/ Common Issues About MapReduce/ After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time
Updated on 2023-04-28 GMT+08:00

After an Active/Standby Switchover of ResourceManager Occurs, a Task Is Interrupted and Runs for a Long Time

Question

During the running of a MapReduce task, active/standby switchover of ResourceManager occurs. After the switchover is complete, the MapReduce task continues to execute, but runs for an excessively long time.

Answer

The ResourceManager HA function has been enabled, but the Work-preserving RM restart function is not enabled.

If the Work-preserving RM restart function is not enabled, the container will be killed during the ResourceManager switchover. As a result, Application Master times out. For details about the Work-preserving RM restart function, visit the following website:

Versions earlier than MRS 3.2.0: http://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html

MRS 3.2.0 or later: https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html

To resolve this issue, perform the following operation:

Set the yarn.resourcemanager.work-preserving-recovery.enabled parameter to true to enable the Work-preserving RM restart function.

yarn.resourcemanager.work-preserving-recovery.enabled=true