Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold
Updated on 2024-09-23 GMT+08:00

ALM-18024 Pending Yarn Memory Usage Exceeds the Threshold

Alarm Description

The alarm module checks the pending memory of Yarn every 60 seconds. The alarm is generated when the pending memory exceeds the threshold. Pending memory indicates the total memory that is not allocated to submitted Yarn applications.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

18024

Major

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster for which the alarm was generated.

QueueName

Specifies the queue for which the alarm was generated.

QueueMetric

Specifies the queue metric for which the alarm was generated.

Impact on the System

  • It takes long time to end an application.
  • A new application cannot run after submission.

Possible Causes

  • NodeManager node resources are insufficient.
  • The maximum resource capacity of the queue and the maximum AM resource percentage are too small.
  • The monitoring threshold is too small.

Handling Procedure

Check NodeManager resources.

  1. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Yarn > ResourceManager (Active) to access the ResourceManager web UI.
  2. Click Scheduler and check whether the root queue resources are used up in Application Queues.

    • If yes, go to 3.
    • If no, go to 4.

  3. Expand the capacity of the NodeManager instance of the Yarn service. After the capacity expansion, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 6.

Check the maximum queue resource capacity and the maximum AM resource percentage.

  1. Check whether the resources of the queue corresponding to the pending task are used up.

    • If yes, go to 5.
    • If no, go to 6.

  2. On FusionInsight Manager, choose Tenant Resources > Dynamic Resource Plan and add resources as required. Check whether the alarms are cleared.

    • If yes, no further action is required.
    • If no, go to 6.

Adjust the monitoring thresholds.

  1. On FusionInsight Manager, choose O&M > Alarm > Thresholds > Name of the desired cluster > Yarn > CPU and Memory > Pending Memory, and increase the threshold as required.
  2. Check whether the alarm is cleared 5 minutes later.

    • If yes, no further action is required.
    • If no, go to 8.

Collect the fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, and select Yarn for the target cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None