ALM-14010 NameService Service Is Abnormal

Alarm Description

The system checks the NameService service status every 180 seconds. This alarm is generated when the NameService service is unavailable.

This alarm is cleared when the NameService service recovers.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
14010	Major	Quality of service	HDFS	Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm was generated.
	ServiceName	Specifies the service for which the alarm was generated.
	RoleName	Specifies the role for which the alarm was generated.
	HostName	Specifies the host for which the alarm was generated.
	NameServiceName	Specifies the NameService for which the alarm was generated.

Impact on the System

HDFS fails to provide services for upper-layer components based on the NameService service, such as HBase and MapReduce. As a result, users cannot read or write files.

Possible Causes

The KrbServer service is abnormal.
The JournalNode is faulty.
The DataNode is faulty.
The disk capacity is insufficient.
The NameNode enters safe mode.

Handling Procedure

Check the KrbServer service status.

On FusionInsight Manager, choose Cluster > Services.
Check whether the KrbServer service exists.
- If yes, go to 3.
- If no, go to 6.
Click KrbServer.
Click Instances. On the KrbServer management page, select the faulty instance, and choose More > Restart Instance. Check whether the instance successfully restarts.
- If yes, go to 5.
- If no, go to 24.
Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 6.

Check the JournalNode instance status.

On FusionInsight Manager, choose Cluster > Services.
Choose HDFS > Instances.
Check whether the Running Status of the JournalNode is Normal.
- If yes, go to 11.
- If no, go to 9.
Select the faulty JournalNode, and choose More > Restart Instance. Check whether the JournalNode successfully restarts.
- If yes, go to 10.
- If no, go to 24.
Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 11.

Check the DataNode instance status.

On FusionInsight Manager, choose Cluster > Services > HDFS.
Click Instances and check whether Running Status of all DataNodes is Normal.
- If yes, go to 15.
- If no, go to 13.
Click Instances. On the DataNode management page, select the faulty instance, and choose More > Restart Instance. Check whether the DataNode successfully restarts.
- If yes, go to 14.
- If no, go to 15.
Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 15.

Check disk status.

On FusionInsight Manager, choose Hosts.
In the Disk column, check whether the disk space is insufficient.
- If yes, go to 17.
- If no, go to 19.
Expand the disk capacity.
Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 19.

Check whether NameNode is in the safe mode.

On FusionInsight Manager, choose Cluster > Services > HDFS.
In the Basic Information area on the Dashboard page of HDFS (or in the NameService Summary area on the Dashboard page of HDFS), check whether the value of Safe Mode is ON.

ON indicates that the safe mode is enabled.
- If yes, go to 21.
- If no, go to 24.
Log in to the client as user root. Run the cd command to go to the client installation directory and run the source bigdata_env command. If the cluster uses the security mode, perform security authentication. Run the kinit hdfs command and enter the password as prompted. The password can be obtained from the MRS cluster administrator. If the cluster uses the non-security mode, log in as user omm and run the command. Ensure that user omm has the client execution permission.
Run hdfs dfsadmin -safemode leave.
Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 24.

Collect fault information.

On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
In the Service area, select the following nodes of the desired cluster.
- ZooKeeper
- HDFS
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact O&M engineers and provide the collected logs.