Updated on 2024-04-11 GMT+08:00

ALM-18011 Memory of Pending Yarn Tasks Exceeds the Threshold (For MRS 2.x or Earlier)

Description

The system checks the memory of pending Yarn tasks every 30 seconds and compares the memory with the threshold. This alarm is generated when the memory of pending tasks exceeds the threshold.

You can change the threshold by choosing System > Configure Alarm Threshold > Service > Yarn > Queue Root Pending Memory > Queue Root Pending Memory on MRS Manager.

This alarm is cleared when the memory of pending tasks is less than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

18011

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Trigger Condition

Specifies the threshold for triggering the alarm.

Impact on the System

Tasks may be stacked and cannot be processed in a timely manner.

Possible Causes

The computing capability of the cluster is lower than the task submission rate. As a result, the task cannot be processed in a timely manner after being submitted.

Procedure

  1. Check the usage of memory and vCores on the Yarn page.

    Check whether the values of Memory Used|Memory Total and VCores Used|VCores Total on the native Yarn page reach or approach the maximum values.

    • If yes, go to 2.
    • If no, go to 5.

  2. Check the number of submitted tasks.

    Check whether the running tasks are submitted at a normal frequency.

    • If yes, go to 3.
    • If no, go to 5.

  3. Scale out the cluster.

    The scale-out is based on the site requirements. For details, see .

  4. After the scale-out is completed, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

  5. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None