ALM-14010 NameService Service Is Abnormal
Alarm Description
The system checks the NameService service status every 180 seconds. This alarm is generated when the NameService service is unavailable.
This alarm is cleared when the NameService service recovers.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
14010 |
Major |
Quality of service |
HDFS |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
|
RoleName |
Specifies the role for which the alarm was generated. |
|
HostName |
Specifies the host for which the alarm was generated. |
|
NameServiceName |
Specifies the NameService for which the alarm was generated. |
Impact on the System
HDFS fails to provide services for upper-layer components based on the NameService service, such as HBase and MapReduce. As a result, users cannot read or write files.
Possible Causes
- The KrbServer service is abnormal.
- The JournalNode is faulty.
- The DataNode is faulty.
- The disk capacity is insufficient.
- The NameNode enters safe mode.
Handling Procedure
Check the KrbServer service status.
- On FusionInsight Manager, choose Cluster > Services.
- Check whether the KrbServer service exists.
- Click KrbServer.
- Click Instances. On the KrbServer management page, select the faulty instance, and choose More > Restart Instance. Check whether the instance successfully restarts.
- Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 6.
Check the JournalNode instance status.
- On FusionInsight Manager, choose Cluster > Services.
- Choose HDFS > Instances.
- Check whether the Running Status of the JournalNode is Normal.
- Select the faulty JournalNode, and choose More > Restart Instance. Check whether the JournalNode successfully restarts.
- Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 11.
Check the DataNode instance status.
- On FusionInsight Manager, choose Cluster > Services > HDFS.
- Click Instances and check whether Running Status of all DataNodes is Normal.
- Click Instances. On the DataNode management page, select the faulty instance, and choose More > Restart Instance. Check whether the DataNode successfully restarts.
- Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 15.
Check disk status.
- On FusionInsight Manager, choose Hosts.
- In the Disk column, check whether the disk space is insufficient.
- Expand the disk capacity.
- Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 19.
Check whether NameNode is in the safe mode.
- On FusionInsight Manager, choose Cluster > Services > HDFS.
- In the Basic Information area on the Dashboard page of HDFS (or in the NameService Summary area on the Dashboard page of HDFS), check whether the value of Safe Mode is ON.
ON indicates that the safe mode is enabled.
- Log in to the client as user root. Run the cd command to go to the client installation directory and run the source bigdata_env command. If the cluster uses the security mode, perform security authentication. Run the kinit hdfs command and enter the password as prompted. The password can be obtained from the MRS cluster administrator. If the cluster uses the non-security mode, log in as user omm and run the command. Ensure that user omm has the client execution permission.
- Run hdfs dfsadmin -safemode leave.
- Choose O&M > Alarm > Alarms and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 24.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- In the Service area, select the following nodes of the desired cluster.
- ZooKeeper
- HDFS
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot