Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-25004 Abnormal LdapServer Data Synchronization
Updated on 2024-09-23 GMT+08:00

ALM-25004 Abnormal LdapServer Data Synchronization

Description

The system checks the LdapServer data every 30 seconds. This alarm is generated when the data on the active and standby LdapServers of Manager is inconsistent for 12 consecutive times. This alarm is cleared when the data on the active and standby LdapServers is consistent.

The system checks the LdapServer data every 30 seconds. This alarm is generated when the LdapServer data in the cluster is inconsistent with that on Manager for 12 consecutive times. This alarm is cleared when the data is consistent.

Attribute

Alarm ID

Alarm Severity

Auto Clear

25004

Critical

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

LdapServer data inconsistency occurs because the LdapServer data in Manager is damaged or the LdapServer data in the cluster is damaged. The LdapServer process with damaged data cannot provide services externally, and the authentication functions of Manager and the cluster are affected.

Possible Causes

  • The network of the node where the LdapServer process locates is faulty.
  • The LdapServer process is abnormal.
  • The OS restart damages data on LdapServer.
  • The amount of Oldap data exceeds the threshold (10 MB by default).

Procedure

Check whether the network where the LdapServer nodes reside is faulty.

  1. On the FusionInsight Manager portal, choose O&M > Alarm > Alarms. Record the IP address of HostName in the alarm locating information as IP1 (if multiple alarms exist, record the IP addresses as IP1, IP2, and IP3 respectively).
  2. Contact O&M personnel and log in to the nodes corresponding to IP 1. Run the ping command to check whether the IP address of the management plane of the active OMS node can be pinged.

    • If yes, go to 4.
    • If no, go to 3.

  3. Contact the network administrator to recover the network and check whether Abnormal LdapServer Data Synchronization is cleared.

    • If yes, no further action is required.
    • If no, go to 4.

Check whether the LdapServer processes are normal.

  1. On the Alarm page of FusionInsight Manager, check whether the OLdap Resource Abnormal exists.

    • If yes, go to 5.
    • If no, go to 7.

  2. Clear the alarm by following the steps provided in "ALM-12004 OLdap Resource Abnormal".
  3. Check whether Abnormal LdapServer Data Synchronization is cleared in the alarm list.

    • If yes, no further action is required.
    • If no, go to 7.

  4. On the Alarm page of FusionInsight Manager, check whether Process Fault is generated for the LdapServer service.

    • If yes, go to 8.
    • If no, go to 10.

  5. Handle the alarm according to "ALM-12007 Process Fault".
  6. Check whether Abnormal LdapServer Data Synchronization is cleared.

    • If yes, no further action is required.
    • If no, go to 10.

Check whether the LdapServer processes are normal.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarms. Record the IP address of HostName in the alarm locating information as "IP1" (if multiple alarms exist, record the IP addresses as "IP1", "IP2", and "IP3" respectively). Choose Cluster > Name of the desired cluster > Services > LdapServer > Configurations. Record the port number of LdapServer as "PORT". (If the IP address in the alarm locating information is the IP address of the standby management node, choose System > OMS > oldap > Modify Configuration and record the listening port number of LdapServer.)
  2. Log in to the nodes corresponding to IP1 as user omm.
  3. Run the following command to check whether errors are displayed in the queried information.

    ldapsearch -H ldaps://IP1:PORT -LLL -x -D cn=root,dc=hadoop,dc=com -W -b ou=Peoples,dc=hadoop,dc=com

    After running the command, enter the LDAP administrator password. Contact the system administrator to obtain the password.

    • If yes, go to 13.
    • If no, go to 15.

  4. Recover the LdapServer and OMS nodes using data backed up before the alarm is generated.

    Use the OMS data and LdapServer data backed up at the same point in time to recover the data. Otherwise, the service and operation may fail. To recover data when services run properly, you are advised to manually back up the latest management data and then recover the data. Otherwise, Manager data produced between the backup point in time and the recovery point in time will be lost.

  5. Check whether alarm Abnormal LdapServer Data Synchronization is cleared.

    • If yes, no further action is required.
    • If no, go to 15.

Check whether the data volume of the Oldap exceeds the threshold (10 MB by default). (This step applies only to versions earlier than MRS 3.3.0. For MRS 3.3.0 and later versions, go to 18.)

  1. Log in to the active OMS node as user omm.
  2. Run the following command to check whether the directory contains .mdb files.

    ll /srv/BigData/ldapData/oldap/data/

    • If yes, check and record the size of the .mdb file and go to 17.
    • If no, go to 18.

  3. Run the following command to view the Oldap configuration and record the value of Map size (the default value is 10485760 bytes, that is, 10 MB)

    mdb_stat -e /srv/BigData/ldapData/oldap/data/

    Check whether the size of the .mdb file with 16 records reaches the value of Map size.

    • If yes, contact the O&M personnel.
    • If no, go to 18.

Collect fault information.

  1. On the FusionInsight Manager portal, choose O&M > Log > Download.
  2. Select LdapServer in the required cluster and OmsLdapServer from the Service.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M personnel and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None