Help Center > > User Guide> FusionInsight Manager Operation Guide (Applicable to 3.x)> Alarm Reference (Applicable to MRS 3.x)> ALM-27001 DBService Service Unavailable

ALM-27001 DBService Service Unavailable

Updated at: Mar 25, 2021 GMT+08:00

Description

The alarm module checks the DBService service status every 30 seconds. This alarm is generated when the system detects that DBService service is unavailable.

This alarm is cleared when DBService service recovers.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

27001

Critical

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The database service is unavailable and cannot provide data import and query functions for upper-layer services, which results in some services exceptions.

Possible Causes

  • The floating IP address does not exist.
  • There is no active DBServer instance.
  • The active and standby DBServer processes are abnormal.

Procedure

Check whether the floating IP address exists in the cluster environment.

  1. On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > DBService > Instance.
  2. Check whether the active instance exists.

    • If yes, go to 3.
    • If no, go to 9.

  3. Select the active DBServer instance and record the IP address.
  4. Log in to the host that corresponds to the preceding IP address as user root, and run the ifconfig command to check whether the DBService floating IP address exists on the node.

    • If yes, go to 5.
    • If no, go to 9.

  5. Run the ping floatip command to check whether the DBService floating IP address can be pinged successfully.

    • If yes, go to 6.
    • If no, go to 9.

  6. Log in to the host that corresponds to the DBService floating IP address as user root, and run the command to delete the floating IP address.

    ifconfig interface down

  7. On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > DBService > More > Restart Service to restart DBService, and check whether DBService is restarted successfully.

    • If yes, go to 8.
    • If no, go to 9.

  8. Wait for about 2 minutes and check whether the alarm is cleared in the alarm list.

    • If yes, no further action is required.
    • If no, go to 14.

Check the status of the active DBServer instance.

  1. Select the DBServer instance whose role status is abnormal and record the IP address.
  2. On the Alarm page, check whether Process Fault occurs in the DBServer instance on the host that corresponds to the IP address.

    • If yes, go to 11.
    • If no, go to 14.

  3. Handle the alarm according to "ALM-12007 Process Fault".
  4. Wait for about 5 minutes and check whether the alarm is cleared in the alarm list.

    • If yes, no further action is required.
    • If no, go to 19.

Check the status of the active and standby DBServers.

  1. Log in to the host that corresponds to the preceding IP address as user root, and run the su - omm command to switch to user omm.
  2. Run the cd ${DBSERVER_HOME} command to go to the installation directory of the DBService.
  3. Run the sh sbin/status-dbserver.sh command to view the status of the active and standby HA processes of DBService. Determine whether the status can be viewed successfully.

    HAMode 
    double 
    
    NodeName                  HostName               HAVersion                StartTime                HAActive             HAAllResOK           HARunPhase          
    10_5_89_12                host01                 V100R001C01              2019-06-13 21:33:09      active               normal               Actived             
    10_5_89_66                host03                 V100R001C01              2019-06-13 21:33:09      standby              normal               Deactived           
    
    NodeName                  ResName                ResStatus                ResHAStatus              ResType             
    10_5_89_12                floatip                Normal                   Normal                   Single_active       
    10_5_89_12                gaussDB                Active_normal            Normal                   Active_standby      
    10_5_89_66                floatip                Stopped                  Normal                   Single_active       
    10_5_89_66                gaussDB                Standby_normal           Normal                   Active_standby  
    • If yes, go to 16.
    • If no, go to 19.

  4. Check whether the active and standby HA processes are in the abnormal state.

    • If yes, go to 17.
    • If no, go to 19.

  5. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > DBService > More > Restart Service to restart DBService, and check whether the system displays a message indicating that the restart is successful.

    • If yes, go to 18.
    • If no, go to 19.

  6. Wait for about 2 minutes and check whether the alarm is cleared in the alarm list.

    • If yes, no further action is required.
    • If no, go to 19.

Collect fault information.

  1. On FusionInsight Manager, choose O&M > Log > Download.
  2. Select DBService in the required cluster and NodeAgent from the Service.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M personnel and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel