ALM-18000 Yarn Service Unavailable

This alarm is generated when the Yarn service is unavailable. The alarm module checks the Yarn service status every 60 seconds.

The alarm is cleared when the Yarn service recovers.

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
18000	Critical	Error handling	Yarn	Yes

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceNam	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.

The cluster cannot provide Yarn services. Users cannot run new applications. Submitted applications cannot be run.

Check ZooKeeper service status.

On the FusionInsight Manager, check whether the alarm list contains ALM-13000 ZooKeeper Service Unavailable.
- If yes, go to 2.
- If no, go to 3.
Rectify the fault by following the steps provided in ALM-13000 ZooKeeper Service Unavailable, and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 3.

Check the HDFS service status.

On the FusionInsight Manager, check whether the alarm list contains the HDFS alarms.
- If yes, go to 4.
- If no, go to 5.
Choose O&M > Alarm > Alarms, handle HDFS alarms based on the alarm help, and check whether the Yarn alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.

Check the ResourceManager status in the Yarn cluster.

On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Yarn.
In Dashboard, check whether there is an active ResourceManager instance in the Yarn cluster.
- If yes, go to 7.
- If no, go to 10.

Check the NodeManager node status in the Yarn cluster.

On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Yarn > Instance.
Query NodeManager Running Status, and check whether there are unhealthy nodes.
- If yes, go to 9.
- If no, go to 10.
Rectify the fault by following the steps provided in ALM-18002 NodeManager Heartbeat Lost or ALM-18003 NodeManager Unhealthy. After the fault is rectified, check whether the Yarn alarm is cleared.
- If yes, no further action is required.
- If no, go to 10.

Collect fault information.

On the FusionInsight Manager portal of the active cluster, choose O&M > Log > Download.
Select Yarn in the required cluster from the Service.
Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M engineers and send the collected logs.