Help Center > > User Guide> FusionInsight Manager Operation Guide (Applicable to 3.x)> Alarm Reference (Applicable to MRS 3.x)> ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold

ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold

Updated at: Sep 02, 2021 GMT+08:00

Description

The alarm module checks the number of pending applications in the Yarn root queue every 60 seconds. The alarm is generated when the number exceeds 60.

Attribute

Alarm ID

Alarm Severity

Auto Clear

18023

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

QueueName

Identifies the queue for which the alarm is generated.

QueueMetric

Identifies the queue indicator for which the alarm is generated.

Impact on the System

  • It takes long time to end an application.
  • A new application cannot run after submission.

Possible Causes

  • NodeManager node resources are insufficient.
  • The maximum resource capacity of the queue and the maximum AM resource percentage are too small.
  • The monitoring threshold is too small.

Procedure

Check NodeManager resources.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Yarn > ResourceManager (Active) to access the ResourceManager web UI.
  2. Click Scheduler and check whether the root queue resources are used up in Application Queues.

    • If yes, go to 3.
    • If no, go to 4.

  3. Expand the capacity of the NodeManager instance of the Yarn service. After the capacity expansion, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 6.

Check the maximum queue resource capacity and the maximum AM resource percentage.

  1. Check whether the resources of the queue corresponding to the pending task are used up.

    • If yes, go to 5.
    • If no, go to 6.

  2. On the FusionInsight Manager portal, choose Tenant Resources > Dynamic Resource Plan and add resources as required. Check whether the alarms are cleared.

    • If yes, no further action is required.
    • If no, go to 6.

Adjust the monitoring thresholds.

  1. On the FusionInsight Manager portal, choose O&M > Alarm > Thresholds > Name of the desired cluster > Yarn > Applications > Pending Applications, and increase the thresholds as required.
  2. Wait for 5 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 8.

Collect the fault information.

  1. On the FusionInsight Manager portal, choose O&M > Log > Download.
  2. Expand the Service drop-down list, and select Yarn for the target cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and send the collected logs.

Alarm Clearing

After the fault that triggers the alarm is rectified, the alarm is automatically cleared.

Related Information

None

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel