Updated on 2024-10-25 GMT+08:00

Viewing Alarms of an MRS Cluster

You can view and clear alarms on MRS. Typically, the system automatically clears an alarm when the fault is rectified. If the fault has been rectified but the alarm is not automatically cleared, you can manually clear the alarm. You can view the latest 100,000 alarms (including uncleared, manually cleared, and automatically cleared alarms) on MRS. If the number of cleared alarms exceeds 100,000 and is about to reach 110,000, the system automatically dumps the earliest 10,000 cleared alarms to the dump path.

  • For versions earlier than MRS 3.x, the path is ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace/data on the active management node.
  • For MRS 3.x or later, the path is ${BIGDATA_HOME}/om-server/OMS/workspace/data on the active management node.

A directory is automatically generated when alarms are dumped for the first time.

Video Tutorial

This tutorial introduces how to view cluster alarms and events and configure an alarm threshold.

The UI may vary depending on the version. This tutorial is for reference only.

Viewing and Clearing Alarms on the Management Console

  1. Log in to the MRS console.
  2. On the Active Clusters page, select a running cluster and click its name to switch to the cluster details page.
  3. Click Alarms and view the alarm information in the alarm list.

    • The alarm list page displays the latest 10 alarms by default.
    • You can filter all alarms of the same severity. The results include cleared and uncleared alarms.
    • Click Export All. In the displayed Export dialog box, set Save As and click OK.
    Table 1 Alarm descriptions

    Parameter

    Description

    Alarm ID

    ID of an alarm.

    Alarm Name

    Name of an alarm.

    Severity

    Alarm severity.

    In versions earlier than MRS 3.x, the cluster alarm severity is as follows:

    • Critical

      Indicates alarms reporting errors that affect cluster running, such as unavailable cluster services, node faults, data inconsistency between the active and standby GaussDB databases, and abnormal LdapServer data synchronization. You need to check the cluster status based on the alarms and rectify the faults in a timely manner.

    • Major

      Indicates alarms reporting errors that affect some cluster functions, including process faults, periodic backup task failures, and abnormal key file permissions. Check the objects for which the alarms are generated based on the alarms and clear the alarms in a timely manner.

    • Minor

      Indicates alarms reporting errors that do not affect major functions of the current cluster, including alarms indicating that the certificate file is about to expire, audit logs fail to be dumped, and the license file is about to expire.

    • Warning

      Indicates an alarm of the lowest severity. It is used for information display or prompt and indicates that an event occurs in the scenarios when you stop a service, delete a service, stop an instance, delete an instance, delete a node, restart a service, restart an instance, perform an active/standby switchover for MRS Manager, scale in a host, or restore an instance. Additionally, this type of alarms also occurs when an instance is faulty, a job executed successfully, or a job failed to be executed.

    In MRS 3.x or later, the alarm severity of a cluster is as follows:

    • Critical

      Indicates alarms reporting errors that affect cluster running, such as unavailable cluster services, node faults, data inconsistency between the active and standby GaussDB databases, and abnormal LdapServer data synchronization. You need to check the cluster status based on the alarms and rectify the faults in a timely manner.

    • Major

      Indicates alarms reporting errors that affect some cluster functions, including process faults, periodic backup task failures, and abnormal key file permissions. Check the objects for which the alarms are generated based on the alarms and clear the alarms in a timely manner.

    • Minor

      Indicates alarms reporting errors that do not affect major functions of the current cluster, including alarms indicating that the certificate file is about to expire, audit logs fail to be dumped, and the license file is about to expire.

    • Warning

      Indicates an alarm of the lowest severity. It is used for information display or prompt and indicates that an event occurs in the scenarios when you stop a service, delete a service, stop an instance, delete an instance, delete a node, restart a service, restart an instance, perform an active/standby switchover for MRS Manager, scale in a host, or restore an instance. Additionally, this type of alarms also occurs when an instance is faulty, a job executed successfully, or a job failed to be executed.

    Generated

    Time when the alarm is generated.

    Location

    Details about the alarm.

    Operation

    If the alarm can be manually cleared, click Clear Alarm.

    To view details about an alarm, click View Help. (This function is available in MRS 3.x or later).

  1. Click Advanced Search. In the displayed alarm search area, set search criteria and click Search to view information about specified alarms. Click Reset to clear the search criteria.

    The start time and end time are specified in Time Range. You can search for alarms generated within the time range.

    Handle the alarm by referring to Alarm Reference. If the alarms in some scenarios are generated due to other cloud services that MRS depends on, you need to contact maintenance personnel of the corresponding cloud services.

  2. Click Clear Alarm if you need to. In the displayed dialog box, click OK.

    After handling multiple alarms, you can select and clear one or more of them in batches by clicking Clear Alarm. Each batch can only clear a maximum of 300 alarms.

Viewing and Clearing Alarms on FusionInsight Manager (MRS 3.x or Later)

  1. Log in to FusionInsight Manager.
  2. Choose O&M > Alarm > Alarms.
  3. View the alarm information reported by each cluster on FusionInsight Manager, including the alarm name, ID, severity, and generation time. By default, the latest 10 alarms are displayed on each page.
  4. You can click on the left of an alarm to view detailed alarm parameters. Table 2 describes the parameters.

    Table 2 Alarm parameters

    Parameter

    Description

    Alarm ID

    Alarm ID.

    Alarm Name

    Alarm name.

    Alarm Severity

    Alarm severity. The options are Critical, Major, Minor, and Suggestion.

    Generated

    Time when the alarm is generated.

    Cleared

    Time when an alarm is cleared. If the alarm is not cleared, -- is displayed.

    Source

    Cluster name.

    Object

    Service, process, or module that triggers the alarm.

    Auto Clear

    Whether the alarm can be automatically cleared after the fault is rectified.

    Alarm Status

    Current status of the alarm. The options are Auto, Manual, and Uncleared.

    Alarm Cause

    Possible cause of an alarm.

    Serial Number

    Number of alarms generated by the system.

    Additional Information

    Error information.

    Location

    Detailed information for locating the alarm, which includes the following:

    • Source: cluster for which the alarm is generated.
    • ServiceName: service for which the alarm is generated.
    • RoleName: role for which the alarm is generated.
    • HostName: host for which the alarm is generated.

  5. Manage alarms.

    • Click Export All to export all alarm details.
    • After handling multiple alarms, you can select and clear one or more of them in batches by clicking Clear Alarm. Each batch can only clear a maximum of 300 alarms.
    • You can filter alarms by object or severity.
    • You can click Advanced Search to search for alarms by alarm ID, name, type, start time, or end time. Click Search to filter alarms that meet the search criteria. Click Advanced Search again to view the number of search criteria that you have configured.
    • You can click Clear, Mask, or View Help to perform corresponding operations on an alarm.
    • If there are a large number of alarms, you can click View by Category to sort uncleared alarms by alarm ID. After alarms are classified, click the number of uncleared alarms to view alarm details.

Viewing and Clearing Alarms on MRS Manager (MRS 2.x or Earlier)

  1. On MRS Manager, click Alarms to view the alarm information in the alarm list.

    • The alarm list page displays the latest 10 alarms by default.
    • You can filter all alarms of the same severity in Severity. The results include cleared and uncleared alarms.

  1. Click Advanced Search. In the displayed alarm search area, set search criteria and click Search to view information about specified alarms. Click Reset to clear the search criteria.

    Start Time and End Time indicate the start time and end time of a time range. You can search for alarms generated within the time range.

    Handle the alarm by referring to Alarm Reference. If the alarms in some scenarios are generated due to other cloud services that MRS depends on, you need to contact maintenance personnel of the corresponding cloud services.

  2. Click Clear Alarm after the fault is rectified to manually clear the alarm.

    After handling multiple alarms, you can select and clear one or more of them in batches by clicking Clear Alarm. Each batch can only clear a maximum of 300 alarms.