Updated on 2025-08-09 GMT+08:00

Viewing Alarms of an MRS Cluster

Alarms and events are important mechanisms for ensuring the stability, reliability, and performance of MRS clusters.

An alarm is a system-generated notification that signals an abnormal condition or fault requiring attention. It is triggered by analyzing system events and requires manual intervention from a user or automatic handling by the system. You can view alarms reported by components on the management console or MRS Manager. You can also define alarm thresholds for threshold-related monitoring metrics in the cluster.

For alarms that can be automatically cleared, the system clears them as soon as the predefined conditions are met. If faults have been rectified and the alarms cannot be automatically cleared, you can manually clear the alarms.

You can view up to 100,000 latest alarms (including uncleared, manually cleared, and automatically cleared alarms) on MRS Manager. If the number of cleared alarms exceeds 100,000 and is about to reach 110,000, the system automatically dumps the earliest 10,000 cleared alarms to the dump path.

The alarm dump directory is as follows. The system automatically generates the directory when alarms are dumped for the first time.

  • Clusters of versions earlier than MRS 3.x: ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace/data directory on the active management node
  • For MRS 3.x clusters: ${BIGDATA_HOME}/om-server/OMS/workspace/data directory on the active management node

Video Tutorial

This tutorial introduces how to view cluster alarms and events and configure an alarm threshold.

The UI may vary depending on the version. This tutorial is for reference only.

  1. Log in to the MRS console.
  2. On the Active Clusters page, select a running cluster and click its name to switch to the cluster details page.
  3. Click Alarms and view the alarm information in the alarm list.

    • The alarm list page displays the latest 10 alarms by default.
    • You can filter all alarms of the same severity. The results include cleared and uncleared alarms.
    • Click Export All. In the displayed Export dialog box, set Save As and click OK.
    Table 1 Alarm descriptions

    Parameter

    Description

    Alarm ID

    ID of an alarm.

    Alarm Name

    Name of an alarm.

    Severity

    Alarm severity.

    In versions earlier than MRS 3.x, the cluster alarm severity is as follows:

    • Critical

      Indicates alarms reporting errors that affect cluster running, such as unavailable cluster services, node faults, data inconsistency between the active and standby GaussDB databases, and abnormal LdapServer data synchronization. You need to check the cluster status based on the alarms and rectify the faults in a timely manner.

    • Major

      Indicates alarms reporting errors that affect some cluster functions, including process faults, periodic backup task failures, and abnormal key file permissions. Check the objects for which the alarms are generated based on the alarms and clear the alarms in a timely manner.

    • Minor

      Indicates alarms reporting errors that do not affect major functions of the current cluster, including alarms indicating that the certificate file is about to expire, audit logs fail to be dumped, and the license file is about to expire.

    • Warning

      Indicates an alarm of the lowest severity. It is used for information display or prompt and indicates that an event occurs in the scenarios when you stop a service, delete a service, stop an instance, delete an instance, delete a node, restart a service, restart an instance, perform an active/standby switchover for MRS Manager, scale in a host, or restore an instance. Additionally, this type of alarms also occurs when an instance is faulty, a job executed successfully, or a job failed to be executed.

    In MRS 3.x or later, the alarm severity of a cluster is as follows:

    • Critical

      Indicates alarms reporting errors that affect cluster running, such as unavailable cluster services, node faults, data inconsistency between the active and standby GaussDB databases, and abnormal LdapServer data synchronization. You need to check the cluster status based on the alarms and rectify the faults in a timely manner.

    • Major

      Indicates alarms reporting errors that affect some cluster functions, including process faults, periodic backup task failures, and abnormal key file permissions. Check the objects for which the alarms are generated based on the alarms and clear the alarms in a timely manner.

    • Minor

      Indicates alarms reporting errors that do not affect major functions of the current cluster, including alarms indicating that the certificate file is about to expire, audit logs fail to be dumped, and the license file is about to expire.

    • Warning

      Indicates an alarm of the lowest severity. It is used for information display or prompt and indicates that an event occurs in the scenarios when you stop a service, delete a service, stop an instance, delete an instance, delete a node, restart a service, restart an instance, perform an active/standby switchover for MRS Manager, scale in a host, or restore an instance. Additionally, this type of alarms also occurs when an instance is faulty, a job executed successfully, or a job failed to be executed.

    Generated

    Time when the alarm is generated.

    Location

    Details about the alarm.

    Operation

    If the alarm can be manually cleared, click Clear Alarm.

    To view details about an alarm, click View Help. (This function is available in MRS 3.x or later).

  1. Click Advanced Search. In the displayed alarm search area, set search criteria and click Search to view information about specified alarms. Click Reset to clear the search criteria.

    The start time and end time are specified in Time Range. You can search for alarms generated within the time range.

    Handle the alarm by referring to the Alarm Reference. If the alarms in some scenarios are generated due to other cloud services that MRS depends on, you need to contact maintenance personnel of the corresponding cloud services.

  2. Click Clear Alarm if you need to. In the displayed dialog box, click OK.

    If multiple alarms have been handled, you can select one or more alarms to be cleared and click Clear Alarm to clear the alarms in batches. A maximum of 300 alarms can be cleared in each batch.

  1. Log in to FusionInsight Manager of the MRS cluster.

    For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.

  2. Choose O&M > Alarm > Alarms.
  3. View the alarm information reported by each cluster on FusionInsight Manager, including the alarm name, ID, severity, and generation time. By default, the latest 10 alarms are displayed on each page.
  4. You can click on the left of an alarm to view detailed alarm parameters. Table 2 describes the parameters.

    Table 2 Alarm parameters

    Parameter

    Description

    Alarm ID

    Alarm ID.

    Alarm Name

    Alarm name.

    Alarm Severity

    Alarm severity. The options are Critical, Major, Minor, and Suggestion.

    Generated

    Time when the alarm is generated.

    Cleared

    Time when an alarm is cleared. If the alarm is not cleared, -- is displayed.

    Source

    Cluster name.

    Object

    Service, process, or module that triggers the alarm.

    Auto Clear

    Whether the alarm can be automatically cleared after the fault is rectified.

    Alarm Status

    Current status of the alarm. The options are Auto, Manual, and Uncleared.

    Alarm Cause

    Possible cause of an alarm.

    Serial Number

    Number of alarms generated by the system.

    Additional Information

    Error information.

    MRS 3.3.0 or later: You can view the monitoring metric values in Additional Information if thresholds are set for the metrics to generate alarms.

    Location

    Detailed information for locating the alarm, which includes the following:

    • Source: cluster for which the alarm is generated.
    • ServiceName: service for which the alarm is generated.
    • RoleName: role for which the alarm is generated.
    • HostName: host for which the alarm is generated.

  5. Manage alarms.

    • Click Export All to export all alarm details.
    • After handling multiple alarms, you can select and clear one or more of them in batches by clicking Clear Alarm. Each batch can only clear a maximum of 300 alarms.
    • You can filter alarms by object or severity.
    • You can click Advanced Search to search for alarms by alarm ID, name, type, start time, or end time. Click Search to filter alarms that meet the search criteria. Click Advanced Search again to view the number of search criteria that you have configured.
    • You can click Clear, Mask, or View Help to perform corresponding operations on an alarm.
    • If there are a large number of alarms, you can click View by Category to sort uncleared alarms by alarm ID. After alarms are classified, click the number of uncleared alarms to view alarm details.

Helpful Links