ALM-43616 GraphBase-related Yarn Jobs Are Abnormal
Alarm Description
The system checks Yarn jobs related to GraphBase every 30 seconds. This alarm is generated when a failed Yarn job is found.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
43616 |
Minor |
Error handling |
GraphBase |
No |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
|
RoleName |
Specifies the role for which the alarm is generated. |
|
HostName |
Specifies the host for which the alarm is generated. |
|
Additional Information |
TaskType |
Specifies the job type of an asynchronous Yarn job. |
TaskId |
Specifies the ID of the Yarn task that fails to be executed. |
Impact on the System
- Operations performed in GraphBase may fail.
- The GraphBase service may be unavailable.
- After the fault is rectified, you need to execute the task again.
Possible Causes
Required parameter configuration for Yarn jobs is incorrect.
Handling Procedure
Check GraphBase-related Yarn jobs.
- On FusionInsight Manager, choose Cluster > Name of the desired cluster > Service > Yarn > ResourceManager(Active). On the Yarn web UI, analyze the cause of the Yarn task failure.
By default, the admin user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.
- Find the failure cause and submit Yarn jobs again to check whether new Yarn jobs can be successfully executed.
- If yes, click Clear in the Operation column of the alarm to manually clear the alarm.
- If no, go to 3.
- If the new Yarn jobs submitted fail to be executed, download the fault logs and analyze the cause.
Collect the fault information.
- On the FusionInsight Manager homepage, choose O&M > Log > Download.
- Expand the Service drop-down list, and select GraphBase for the target cluster.
- Click in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Alarm Clearance
After the fault is rectified, the system does not automatically clear this alarm and you need to manually clear the alarm.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot