ALM-18000 Yarn Service Unavailable

The alarm module checks the Yarn service status every 60 seconds. This alarm is generated when the Yarn service is unavailable.

The alarm is cleared when the Yarn service recovers.

Alarm ID	Alarm Severity	Auto Cleared
18000	Critical	Yes

Parameter	Description
Source	Specifies the cluster for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.

Check ZooKeeper service status.

On the FusionInsight Manager, check whether the alarm list contains ALM-13000 ZooKeeper Service Unavailable.
- If yes, go to Step 2.
- If no, go to Step 3.
Rectify the fault by performing the operations provided in ALM-13000 ZooKeeper Service Unavailable. Then check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 3.

Check the HDFS service status.

On the FusionInsight Manager, check whether the alarm list contains the HDFS alarms.
- If yes, go to Step 4.
- If no, go to Step 5.
Choose O&M > Alarm > Alarms, handle HDFS alarms based on the alarm help, and check whether the Yarn alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 5.

Check the ResourceManager status in the Yarn cluster.

On FusionInsight Manager, choose Cluster > Services > Yarn.
On the Dashboard page, check whether there is any active ResourceManager instance in the Yarn cluster.
- If yes, go to Step 7.
- If no, go to Step 10.

Check the NodeManager node status in the Yarn cluster.

On FusionInsight Manager, choose Cluster > Services > Yarn > Instances.
Query the status of NodeManager and check whether there are unhealthy nodes.
- If yes, go to Step 9.
- If no, go to Step 10.
Rectify the fault by following the steps provided in ALM-18002 NodeManager Heartbeat Lost or ALM-18003 NodeManager Unhealthy. After the fault is rectified, check whether the Yarn alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 10.

Collect fault information.

On the FusionInsight Manager portal of the active cluster, choose O&M > Log > Download.
Select Yarn in the required cluster from the Service.
Click in the upper right corner, and select a time span starting 10 minutes before and ending 10 minutes after when the alarm was generated. Then, click Download to collect the logs.
Contact the O&M personnel and send the collected logs.