ALM-12012 NTP Service Abnormal (For MRS 2.x or Earlier)

Description

This alarm is generated when the NTP service on the current node fails to synchronize time with the NTP service on the active OMS node.

This alarm is cleared when the NTP service on the current node synchronizes time properly with the NTP service on the active OMS node.

Attribute

Alarm ID	Alarm Severity	Auto Clear
12012	Major	Yes

Parameters

Parameter	Description
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.

Impact on the System

The time on the node is inconsistent with that on other nodes in the cluster. Therefore, some MRS applications on the node may not run properly.

Possible Causes

The NTP service on the current node cannot start properly.
The current node fails to synchronize time with the NTP service on the active OMS node.
The key value authenticated by the NTP service on the current node is inconsistent with that on the active OMS node.
The time offset between the node and the NTP service on the active OMS node is large.

Procedure

Check the NTP service on the current node.
1. Check whether the ntpd process is running on the node using the following method. Log in to the node for which the alarm is generated and run the sudo su - root command to switch to user root. Then run the following command to check whether the command output contains the ntpd process:
  ps -ef | grep ntpd | grep -v grep
  - If yes, go to 2.a.
  - If no, go to 1.b.
2. Run service ntp start to start the NTP service.
3. Wait 10 minutes and check whether the alarm is cleared.
  - If yes, no further action is required.
  - If no, go to 2.a.
Check whether the current node can synchronize time properly with the NTP service on the active OMS node.
1. Check whether the node can synchronize time with the NTP service on the active OMS node based on additional information of the alarm.
  If yes, go to 2.b.
  
  If no, go to 3.
2. Check whether the synchronization with the NTP service on the active OMS node is faulty.
  Log in to the node for which the alarm is generated, run the sudo su - root command to switch to user root, and run the ntpq -np command.
  
  If an asterisk (*) exists before the IP address of the NTP service on the active OMS node in the command output, the synchronization is in normal state. The command output is as follows:
```
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.10.10.162 .LOCL. 1 u 1 16 377 0.270 -1.562 0.014
```
  If there is no asterisk (*) before the IP address of the NTP service on the active OMS node, as shown in the following command output, and the value of refid is .INIT., the synchronization is abnormal.
```
remote refid st t when poll reach delay offset jitter
==============================================================================
10.10.10.162 .INIT. 1 u 1 16 377 0.270 -1.562 0.014
```
  - If yes, go to 2.c.
  - If no, go to 3.
3. Rectify the fault, wait 10 minutes, and then check whether the alarm is cleared.
  An NTP synchronization failure is usually related to the system firewall. If the firewall can be disabled, disable it and then check whether the fault is rectified. If the firewall cannot be disabled, check the firewall configuration policies and ensure that port UDP 123 is enabled (you need to follow specific firewall configuration policies of each system).
  - If yes, no further action is required.
  - If no, go to 3.
Check whether the key value authenticated by the NTP service on the current node is consistent with that on the active OMS node.

Run cat /etc/ntp.keys to check whether the authentication code whose key value index is 1 is the same as the value of the NTP service on the active OMS node.
- If yes, go to 4.a.
- If no, go to 5.
Check whether the time offset between the node and the NTP service on the active OMS node is large.
1. Check whether the time offset is large in additional information of the alarm.
  - If yes, go to 4.b.
  - If no, go to 5.
2. On the Hosts page, select the host of the node, and choose More > Stop All Roles to stop all the services on the node.
  If the time on the alarm node is later than that on the NTP service of the active OMS node, adjust the time of the alarm node. After adjusting the time, choose More > Start All Roles to start the services on the node.
  
  If the time on the alarm node is earlier than that on the NTP service of the active OMS node, wait until the time offset is due and adjust the time of the alarm node. After adjusting the time, choose More > Start All Roles to start the services on the node.
  
  If you do not wait, data loss may occur.
3. Wait 10 minutes and check whether the alarm is cleared.
  - If yes, no further action is required.
  - If no, go to 5.
Collect fault information.
1. On MRS Manager, choose System > Export Log.
2. Contact the O&M engineers and send the collected logs.