ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold
Alarm Description
The system checks the direct memory usage of ResourceManager every 30 seconds. This alarm is generated when the direct memory usage of ResourceManager instances exceeds the threshold (90% of the maximum memory).
This alarm is automatically cleared when the direct memory usage is less than the threshold.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
18013 |
Major |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
RoleName |
Specifies the role for which the alarm was generated. |
HostName |
Specifies the host for which the alarm was generated. |
Trigger Condition |
Specifies the threshold for triggering the alarm. |
Impact on the System
If the available direct memory of ResourceManager is insufficient, a memory overflow occurs and the service breaks down.
Possible Causes
The direct memory of ResourceManager instances is overused or the direct memory is inappropriately allocated.
Handling Procedure
Check the direct memory usage.
- On FusionInsight Manager, choose O&M > Alarm > Alarms > ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold > Location. View the IP address of the instance for which the alarm is generated.
- On FusionInsight Manager, choose Cluster, click the name of the desired cluster, and choose Services > Yarn. On the page that is displayed, click the Instances tab and click the ResourceManager instance for which this alarm is generated. Click the drop-down list in the upper right corner of the chart area, choose Customize > Resource, and select Memory Usage Status of ResourceManager to check the direct memory usage.
Figure 1 Customizing ResourceManager memory usage details
- Check whether the used direct memory of a ResourceManager instance reaches 90% (default threshold) of the maximum direct memory allocated to it.
- On FusionInsight Manager, choose Cluster, click the name of the desired cluster, and choose Services > Yarn > Configurations > All Configurations > ResourceManager > System. Check whether -XX:MaxDirectMemorySize exists in the GC_OPTS parameter.
- Delete the -XX:MaxDirectMemorySize parameter from GC_OPTS and save the configuration.
MaxDirectMemorySize indicates the maximum off-heap memory size. If the MaxDirectMemorySize parameter of ResourceManager is not specified, the memory of ResourceManager is not limited. By default, -XX:MaxDirectMemorySize in the GC_OPTS parameter is not set.
- Perform the following steps to restart the ResourceManager instance:
- Restarting the standby ResourceManager instance does not affect services.
- During the ResourceManager switchover, new jobs cannot be submitted to Yarn, but submitted jobs are not affected.
- On the Yarn service page, click the Instances tab, select the ResourceManager (Standby) instance, choose More, select Restart Instance, and verify the password to restart the instance.
- After the standby instance is restarted, click the Dashboard tab of Yarn, choose More, select Perform ResourceManager Switchover, and verify the password to perform an active/standby switchover.
- After the active/standby switchover is complete, click the Instances tab on the Yarn service page, select the ResourceManager (Standby) instance, choose More, select Restart Instance, and verify the password to restart the instance. Wait until the instance is restarted.
- Check whether ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold exists.
- If yes, rectify the fault by referring to ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold.
- If no, go to 8.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 9.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select ResourceManager for the target cluster.
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot