Updated on 2025-08-09 GMT+08:00

ALM-12012 NTP Service Is Abnormal

Alarm Description

The system checks whether the NTP service on a node synchronizes time with the NTP service on the active OMS node every 60 seconds. This alarm is generated when the NTP service fails to synchronize time for two consecutive times.

This alarm is generated when the time difference between the NTP service on a node and the NTP service on the active OMS node is greater than or equal to 20s for two consecutive times. This alarm is cleared when the time difference is less than 20s.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

12012

Major

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster or system for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

RoleName

Specifies the role for which the alarm was generated.

HostName

Specifies the host for which the alarm was generated.

Impact on the System

The time on the node is inconsistent with that on other nodes in the cluster. Therefore, some FusionInsight applications on the node may not run properly. If the time difference between the node and other Kerberos service instances keeps increasing, Kerberos authentication on the node may fail and service exceptions occur.

Possible Causes

  • The NTP service on the current node cannot start properly.
  • The current node fails to synchronize time with the NTP service on the active OMS node.
  • The key authenticated by the NTP service on the current node is inconsistent with that on the active OMS node.
  • The time offset between the node and the NTP service on the active OMS node is large.

Handling Procedure

Check the NTP service mode of the node.

  1. Log in to the active management node as the root user and check the resource status of the active and standby management nodes.

    For details about how to log in to a cluster node, see Logging In to an MRS Cluster Node.

    Switch to user omm:

    su - omm

    Check the resource status of the active and standby management nodes.

    sh ${BIGDATA_HOME}/om-server/om/sbin/status-oms.sh
    • If "chrony" is displayed in the ResName column of the command output, go to Step 2.
    • If "ntp" is displayed in the ResName column, go to Step 20.

    If both "chrony" and "ntp" are displayed in the ResName column of the command output, the NTP service mode is being switched. Wait for 10 minutes and go to Step 1 again. If both "chrony" and "ntp" persist, contact O&M personnel personnel.

Check whether the chrony service on the node is started properly.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarms. On the page that is displayed, click in the row containing the alarm, and view the name of the host for which the alarm is generated in Location.
  2. Check whether the chronyd process is running on the node where the alarm is generated. Log in to the node where the alarm is generated as user root and run the following command to check whether the chronyd process information is displayed:

    ps -ef | grep chronyd | grep -v grep

  3. Start the NTP service.
  4. After 10 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 6.

Check whether the current node can synchronize time properly with the chrony service on the active OMS node.

  1. Check whether the node can synchronize time with the NTP service on the active OMS node based on additional information of the alarm.

  2. Check whether the synchronization with the chrony service on the active OMS node is faulty.

    Log in to the node for which the alarm is generated as user root and run the chronyc sources command.

    chronyc sources

    In the command output, if there is an asterisk (*) before the IP address of the chrony service on the active OMS node, the synchronization is normal. The command output is as follows:

    MS Name/IP address         Stratum Poll Reach LastRx Last sample               
    ===============================================================================
    ^* 10.10.10.162             10  10   377   626    +16us[  +15us] +/-  308us

    In the command output, if there is no asterisk (*) before the IP address of the NTP service on the active OMS node, and the value of Reach is 0, the synchronization is abnormal.

    MS Name/IP address         Stratum Poll Reach LastRx Last sample               
    ===============================================================================
    ^? 10.1.1.1                      0  10     0     -     +0ns[   +0ns] +/-    0ns

  3. The chrony synchronization failure is typically caused by the system firewall. If the firewall can be disabled, disable it. If the firewall cannot be disabled, check the firewall configuration policy and ensure that UDP ports 123 and 323 are not disabled. (For details, see the firewall configuration policy of each system.)
  4. Check whether the alarm is cleared 10 minutes later.

    • If yes, no further action is required.
    • If no, go to Step 10.

  5. Log in to the active OMS node as user root and run the following command to view the authentication code whose key index is 1M:

    cat ${BIGDATA_HOME}/om-server/OMS/workspace/conf/chrony.keys

  6. Run the following command to check whether the key is the same as that queried in Step 10:

    diff ${BIGDATA_HOME}/om-server/OMS/workspace/conf/chrony.keys /etc/chrony.keys

    If the keys are the same, no result is returned after the command is executed. For example:

    host01:~ # cat ${BIGDATA_HOME}/om-server/OMS/workspace/conf/chrony.keys       
    1 M sdYbq;o^CzEAWo<U=Tw5
    host01:~ # diff ${BIGDATA_HOME}/om-server/OMS/workspace/conf/chrony.keys /etc/chrony.keys
    host01:~ #

  7. Run the following command to check whether the key is the same as that queried in Step 10: (Compare the key with that of the authentication key index 1M queried in Step 10.)

    cat ${BIGDATA_HOME}/om-server/om/packaged-distributables/ntpKeyFile

  8. Log in to the faulty node as user root and run the cat /etc/chrony.keys command to check whether the key value is the same as that queried in Step 12 (compare it with that of the authentication key index 1M).

    cat /etc/chrony.keys

  9. Switch to user omm, change the key value of the authentication key index 1M in ${NODE_AGENT_HOME}/chrony.keys to the key value of ntpKeyFile in Step 12, and go to Step 16.

    su - omm
    vi ${NODE_AGENT_HOME}/chrony.keys

  10. Run the following commands as user root or omm to change the NTP key of the active OMS node (change ntp.keys to ntpkeys in Red Hat Enterprise Linux):

    cd ${BIGDATA_HOME}/om-server/OMS/workspace/conf
    sed -i "`cat chrony.keys | grep -n '1 M'|awk -F ':' '{print $1}'`d" chrony.keys
    echo "1 M `cat ${BIGDATA_HOME}/om-server/om/packaged-distributables/ntpKeyFile`" >> chrony.keys

    Check whether the key value of the authentication key index 1M in chrony.keys is the same as that of ntpKeyFile.

    • If yes, go to Step 16.
    • If no, change the key of the authentication key index 1M in chrony.keys to the key of ntpKeyFile and go to Step 16.

  11. After 5 minutes, restart the chrony service on the active OMS node. After 15 minutes, check whether the alarm is cleared.

    systemctl restart chronyd
    • If yes, no further action is required.
    • If no, go to Step 38.

Check whether the time deviation between the node and the chrony service on the active OMS node is large.

  1. Check whether the time deviation is large in additional information of the alarm.

  2. On the Hosts tab page, select the host for which the alarm is generated, and choose More > Stop All Instances to stop all the services on the node.

    If the time on the alarm node is later than that on the chrony service of the active OMS node, adjust the time of the alarm node. After adjusting the time, choose More > Start All Instances to start the services on the node.

    If the time on the alarm node is earlier than that on the chrony service of the active OMS node, wait until the time deviation is due and adjust the time of the alarm node. After adjusting the time, choose More > Start All Instances to start the services on the node.

    If you do not wait, data loss may occur.

  3. After 10 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 38.

Check whether the NTP service on the node is started properly.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarms. On the page that is displayed, click in the row containing the alarm, and view the name of the host for which the alarm is generated in Location.
  2. Check whether the ntpd process is running on the node using the following method. Log in to the node where the alarm is generated as user root. Run the following command to check whether the ntpd process information is displayed:

    ps -ef | grep ntpd | grep -v grep

  3. Start the NTP service:

    service ntp start

    For Red Hat operating systems, run the service ntpd start command.

  4. After 10 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 24.

Check whether the node can synchronize time properly with the NTP service on the active OMS node.

  1. Check whether the node can synchronize time with the NTP service on the active OMS node based on additional information of the alarm.

  2. Check whether the synchronization with the NTP service on the active OMS node is faulty.

    Log in to the alarm node as user root and run the ntpq -np command.

    ntpq -np

    If an asterisk (*) exists before the IP address of the NTP service on the active OMS node in the command output, the synchronization is in normal state. The command output is as follows:

    remote refid st t when poll reach delay offset jitter 
    ============================================================================== 
    *10.10.10.162 .LOCL. 1 u 1 16 377 0.270 -1.562 0.014

    If there is no asterisk (*) before the IP address of the NTP service on the active OMS node, as shown in the following command output, and the value of refid is .INIT., the synchronization is abnormal.

    remote refid st t when poll reach delay offset jitter 
    ============================================================================== 
    10.10.10.162 .INIT. 1 u 1 16 377 0.270 -1.562 0.014

  3. The NTP synchronization failure is typically caused by the system firewall. If the firewall can be disabled, run the iptables -F command to disable it. If the firewall cannot be disabled, run the iptables -L command to check the firewall configuration policy and ensure that the UDP port 123 is not disabled. (For details, see the firewall configuration policy of each system.)

    iptables -F

  4. After 10 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 28.

  1. Log in to the active OMS node as user root and run the following command to view the authentication key index 1M:

    cat ${BIGDATA_HOME}/om-server/OMS/workspace/conf/ntpkeys

  2. Run the following command to check whether the key is the same as that queried in Step 28:

    diff ${BIGDATA_HOME}/om-server/OMS/workspace/conf/ntpkeys /etc/ntp/ntpkeys

    If the keys are the same, no result is returned after the command is executed. For example:

    host01:~ # cat ${BIGDATA_HOME}/om-server/OMS/workspace/conf/ntp.keys       
    1 M sdYbq;o^CzEAWo<U=Tw5
    host01:~ # diff ${BIGDATA_HOME}/om-server/OMS/workspace/conf/ntp.keys /etc/ntp.keys
    host01:~ #

  3. Check whether the key value is the same as that queried in Step 28: (Compare the key with that of the authentication key index 1M queried in Step 28.)

    cat ${BIGDATA_HOME}/om-server/om/packaged-distributables/ntpKeyFile

  4. Log in to the faulty node as user root. Check whether the key value is the same as that queried in Step 30 (compare it with that of the authentication key index 1M).

    cat /etc/ntp/ntpkeys

  5. Switch to user omm, change the key value of the authentication key index 1M in ${NODE_AGENT_HOME}/ntp.keys (${NODE_AGENT_HOME}/ntpkeys in Red Hat Enterprise Linux) to the key value of ntpKeyFile in Step 30, and go to Step 34.

    su - omm

  6. Run the following commands as user root or omm to change the NTP key of the active OMS node (change ntp.keys to ntpkeys in Red Hat Enterprise Linux):

    cd ${BIGDATA_HOME}/om-server/OMS/workspace/conf
    sed -i "`cat ntp.keys | grep -n '1 M'|awk -F ':' '{print $1}'`d" ntp.keys
    echo "1 M `cat ${BIGDATA_HOME}/om-server/om/packaged-distributables/ntpKeyFile`" >>ntp.keys

    Check whether the key value of the authentication key index 1M in ntp.keys is the same as that of ntpKeyFile.

    • If yes, go to Step 34.
    • If no, change the key of the authentication key index 1M in ntp.keys to the key of ntpKeyFile and go to Step 34.

  7. After 5 minutes, restart the NTP service on the active OMS node. After 15 minutes, check whether the alarm is cleared.

    service ntp restart
    • If yes, no further action is required.
    • If no, go to Step 38.

Check whether the time deviation between the node and the NTP service on the active OMS node is large.

  1. Check whether the time deviation is large in additional information of the alarm.

  2. On the Hosts tab page, select the host for which the alarm is generated, and choose More > Stop All Instances to stop all the services on the node.

    If the time on the alarm node is later than that on the NTP service of the active OMS node, adjust the time of the alarm node. After adjusting the time, choose More > Start All Instances to start the services on the node.

    If the time on the alarm node is earlier than that on the NTP service of the active OMS node, wait until the time deviation is due and adjust the time of the alarm node. After adjusting the time, choose More > Start All Instances to start the services on the node.

    If you do not wait, data loss may occur.

  3. After 10 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 38.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, select NodeAgent and OmmServer for the target cluster, and click OK. Expand the Hosts dialog box and select the alarm node and the active OMS node.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None