Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-43616 GraphBase-related Yarn Jobs Are Abnormal
Updated on 2024-11-29 GMT+08:00

ALM-43616 GraphBase-related Yarn Jobs Are Abnormal

Alarm Description

The system checks Yarn jobs related to GraphBase every 30 seconds. This alarm is generated when a failed Yarn job is found.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

43616

Minor

Error handling

GraphBase

No

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Additional Information

TaskType

Specifies the job type of an asynchronous Yarn job.

TaskId

Specifies the ID of the Yarn task that fails to be executed.

Impact on the System

  • Operations performed in GraphBase may fail.
  • The GraphBase service may be unavailable.
  • After the fault is rectified, you need to execute the task again.

Possible Causes

Required parameter configuration for Yarn jobs is incorrect.

Handling Procedure

Check GraphBase-related Yarn jobs.

  1. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Service > Yarn > ResourceManager(Active). On the Yarn web UI, analyze the cause of the Yarn task failure.

    By default, the admin user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.

  2. Find the failure cause and submit Yarn jobs again to check whether new Yarn jobs can be successfully executed.

    • If yes, click Clear in the Operation column of the alarm to manually clear the alarm.
    • If no, go to 3.

  3. If the new Yarn jobs submitted fail to be executed, download the fault logs and analyze the cause.

Collect the fault information.

  1. On the FusionInsight Manager homepage, choose O&M > Log > Download.
  2. Expand the Service drop-down list, and select GraphBase for the target cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M engineers and provide the collected logs.

Alarm Clearance

After the fault is rectified, the system does not automatically clear this alarm and you need to manually clear the alarm.

Related Information

None.