Updated on 2024-02-02 GMT+08:00

Setting RabbitMQ Alarm Rules

This section describes the alarm rules of some metrics and how to configure the rules. In actual scenarios, you are advised to configure alarm rules for metrics by referring to the following alarm policies.

Table 1 Alarm rules for RabbitMQ instances

Metric

Alarm Policy

Description

Solution

Memory High Watermark

Alarm threshold: Raw data ≥ 1

Number of consecutive periods: 1

Alarm severity: Critical

A threshold of 1 indicates that the memory high watermark is reached, blocking message publishing.

  • Accelerate message retrieval.
  • Use publisher confirms and monitor the publishing rate and duration on the publishing end. When the duration increases significantly, apply flow control.

Disk High Watermark

Alarm threshold: Raw data ≥ 1

Number of consecutive periods: 1

Alarm severity: Critical

A threshold of 1 indicates that the disk high watermark is reached, blocking message publishing.

  • Reduce the number of messages accumulated in lazy queues.
  • Reduce the number of messages accumulated in durable queues.
  • Delete queues.

Memory Usage

Alarm threshold: Raw data > Expected usage (30% is recommended)

Number of consecutive periods: 3–5

Alarm severity: Major

To prevent high memory watermarks from blocking publishing, configure an alarm for this metric on each node.

  • Accelerate message retrieval.
  • Use publisher confirms and monitor the publishing rate and duration on the publishing end. When the duration increases significantly, apply flow control.

CPU Usage

Alarm threshold: Raw data > Expected usage (70% is recommended)

Number of consecutive periods: 3–5

Alarm severity: Major

A high CPU usage may slow down publishing rate. Configure an alarm for this metric on each node.

  • Reduce the number of mirrored queues.
  • For a cluster instance, add nodes and rebalance queues between all nodes.

Available Messages

Alarm threshold: Raw data > Expected number of available messages

Number of consecutive periods: 1

Alarm severity: Major

If the number of available messages is too large, messages are accumulated.

See the solution to preventing message accumulation.

Unacked Messages

Alarm threshold: Raw data > Expected number of unacknowledged messages

Number of consecutive periods: 1

Alarm severity: Major

If the number of unacknowledged messages is too large, messages may be accumulated.

  • Check whether the consumer is abnormal.
  • Check whether the consumer logic is time-consuming.

Connections

Alarm threshold: Raw data > Expected number of connections

Number of consecutive periods: 1

Alarm severity: Major

A sharp increase in the number of connections may be a warning of a traffic increase.

The services may be abnormal. Check whether other alarms exist.

Channels

Alarm threshold: Raw data > Expected number of channels

Number of consecutive periods: 1

Alarm severity: Major

A sharp increase in the number of channels may be a warning of a traffic increase.

The services may be abnormal. Check whether other alarms exist.

Erlang Processes

Alarm threshold: Raw data > Expected number of processes

Number of consecutive periods: 1

Alarm severity: Major

A sharp increase in the number of processes may be a warning of a traffic increase.

The services may be abnormal. Check whether other alarms exist.

  • Set the alarm threshold based on the service expectations. For example, if the expected usage is 35%, set the alarm threshold to 35%.
  • The number of consecutive periods and alarm severity can be adjusted based on the service logic.

Procedure

  1. Log in to the management console.
  2. In the upper left corner, click and select a region.

    Select the region where your RabbitMQ instance is.

  3. Click and choose Middleware > Distributed Message Service for RabbitMQ to open the console of DMS for RabbitMQ.
  4. View the instance metrics using either of the following methods:

    • In the row containing the desired instance, click View Metric. On the Cloud Eye console, view the metrics of the instance, nodes, and queues. Metric data is reported to Cloud Eye every minute.
    • Click the desired RabbitMQ instance to view its details. In the navigation pane, choose Monitoring view. On the displayed page, view the metrics of the instance, nodes, and queues. Metric data is reported to Cloud Eye every minute.

  5. Hover the mouse pointer over a metric and click to create an alarm rule for the metric.
  6. Specify the alarm rule details.

    For more information about creating alarm rules, see Creating an Alarm Rule.

    1. Enter the alarm name and description.
    2. Specify the alarm policy and alarm severity.

      For example, an alarm can be triggered and notifications can be sent once every day if the raw value of connections exceeds the preset value for three consecutive periods and no actions are taken to handle the exception.

    3. Set Alarm Notification configurations. If you enable Alarm Notification, set the validity period, notification object, and trigger condition.
    4. Click Create.