Help Center > > User Guide> FusionInsight Manager Operation Guide> Alarm Reference (Applicable to MRS 3.x)> ALM-14010 NameService Service Unavailable

ALM-14010 NameService Service Unavailable

Updated at: Mar 25, 2021 GMT+08:00

Description

This alarm is generated when the NameService service is unavailable. The system checks the NameService service status every 180 seconds.

This alarm is cleared when the NameService service recovers.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

14010

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

NameServiceName

Specifies the NameService for which the alarm is generated.

Impact on the System

Upper-layer components based on this NameService, such as HBase and MapReduce, cannot provide services. As a result, users cannot read or write files.

Possible Causes

  • The KrbServer service is abnormal.
  • The JournalNode node is faulty.
  • The DataNode node is faulty.
  • The disk space is insufficient
  • The NameNode enters safe mode.

Procedure

Check KrbServer service status.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services.
  2. Check whether the KrbServer service exists.

    • If yes, go to 3.
    • If no, go to 6.

  3. Click KrbServer.
  4. Click Instance. On the KrbServer management page, select the faulty Instance, choose More > Restart Instance. Check whether the Instance successfully restarts.

    • If yes, go to 5.
    • If no, go to 24.

  5. On the O&M > Alarm > Alarms tab, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 6.

Check JournalNode instance status.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services.
  2. Choose HDFS > Instance.
  3. In the UI, check whether the Running Status of the JournalNode is Normal.

    • If yes, go to 11.
    • If no, go to 9.

  4. Select the faulty JournalNode and choose More > Restart Instance. Check whether the JournalNode successfully restarts.

    • If yes, go to 10.
    • If no, go to 24.

  5. On the O&M > Alarm > Alarms tab, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 11.

Check DataNode instance status.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HDFS.
  2. Click Instance, check whether the Running Status of all DataNodes is Normal.

    • If yes, go to 15.
    • If no, go to 13.

  3. Click Instance. On the DataNode management page, select the faulty DataNode, choose More > Restart Instance. Check whether the DataNode successfully restarts.

    • If yes, go to 14.
    • If no, go to 15.

  4. On the O&M > Alarm > Alarms tab, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 15.

Check disk status.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Hosts.
  2. In the Disk column, check whether the disk space is insufficient.

    • If yes, go to 17.
    • If no, go to 19.

  3. Expand the disk capacity.
  4. On the O&M > Alarm > Alarms tab, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 19.

Check whether NameNode is in the safe mode.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HDFS, click the NameNode(Active) of service exception of NameService then display the NameNode WebUI.

    By default, the admin user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.

  2. On the NameNode web user interface (WebUI), check whether the following information is displayed: Safe mode is ON.

    Safe mode is ON. Indicates that the safe mode is on. The information follows this sentence is the alarm information.

    • If yes, go to 21.
    • If no, go to 24.

  3. Log in to the FusionInsight client as user root. Run cd to switch to the client installation directory, and run source bigdata_env. If the cluster uses the security mode, perform security authentication. Run kinit hdfs and enter the password as prompted.

    Please obtain the password from the administrator.

  4. Run hdfs dfsadmin -safemode leave.
  5. On the O&M > Alarm > Alarms tab, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 24.

Collect fault information.

  1. On the FusionInsight Manager portal, choose O&M > Log > Download.
  2. Select the following nodes in the required cluster from the Service:

    • ZooKeeper
    • HDFS

  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M personnel and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel