Configuring RabbitMQ Alarms

This section describes the alarm rules of some metrics and how to configure the rules. In actual scenarios, you are advised to configure alarm rules for metrics by referring to the following alarm policies.

**Table 1** RabbitMQ instance metrics and alarm policies (RabbitMQ 3.x.x)
Metric	Alarm Policy	Description	Solution
Memory High Watermark	Alarm threshold: Raw data ≥ 1 Number of consecutive periods: 1 Alarm severity: Critical	A threshold of 1 indicates that the memory high watermark is reached, blocking message publishing.	Accelerate message retrieval. Use publisher confirms and monitor the publishing rate and duration on the publishing end. When the duration increases significantly, apply flow control.
Disk High Watermark	Alarm threshold: Raw data ≥ 1 Number of consecutive periods: 1 Alarm severity: Critical	A threshold of 1 indicates that the disk high watermark is reached, blocking message publishing.	Reduce the number of messages accumulated in lazy queues. Reduce the number of messages accumulated in durable queues. Delete queues.
Memory Usage	Alarm threshold: Raw data > Expected usage (30% is recommended) Number of consecutive periods: 3–5 Alarm severity: Major	To prevent high memory watermarks from blocking publishing, configure an alarm for this metric on each node.	Accelerate message retrieval. Use publisher confirms and monitor the publishing rate and duration on the publishing end. When the duration increases significantly, apply flow control.
CPU Usage	Alarm threshold: Raw data > Expected usage (70% is recommended) Number of consecutive periods: 3–5 Alarm severity: Major	A high CPU usage may slow down publishing rate. Configure an alarm for this metric on each node.	Reduce the number of mirrored queues. For a cluster instance, add nodes and rebalance queues between all nodes.
Available Messages	Alarm threshold: Raw data > Expected number of available messages Number of consecutive periods: 1 Alarm severity: Major	If the number of available messages is too large, messages are accumulated.	See the solution to preventing message accumulation.
Unacked Messages	Alarm threshold: Raw data > Expected number of unacknowledged messages Number of consecutive periods: 1 Alarm severity: Major	If the number of unacknowledged messages is too large, messages may be accumulated.	Check whether the consumer is abnormal. Check whether the consumer logic is time-consuming.
Connections	Alarm threshold: Raw data > Expected number of connections Number of consecutive periods: 1 Alarm severity: Major	A sharp increase in the number of connections may be a warning of a traffic increase.	The services may be abnormal. Check whether other alarms exist.
Channels	Alarm threshold: Raw data > Expected number of channels Number of consecutive periods: 1 Alarm severity: Major	A sharp increase in the number of channels may be a warning of a traffic increase.	The services may be abnormal. Check whether other alarms exist.
Erlang Processes	Alarm threshold: Raw data > Expected number of processes Number of consecutive periods: 1 Alarm severity: Major	A sharp increase in the number of processes may be a warning of a traffic increase.	The services may be abnormal. Check whether other alarms exist.

**Table 2** RabbitMQ instance metrics and alarm policies (RabbitMQ AMQP-0-9-1)
Metric	Alarm Policy	Description	Solution
Available Messages	Alarm threshold: Raw data > Expected number of available messages Number of consecutive periods: 1 Alarm severity: Major	If the number of available messages is too large, messages are accumulated.	See Solutions to Message Accumulation
Connections	Alarm threshold: Raw data > Expected number of connections Number of consecutive periods: 1 Alarm severity: Major	A sharp increase in the number of connections may be a warning of a traffic increase.	The services may be abnormal. Check whether other alarms exist.
Channels	Alarm threshold: Raw data > Expected number of channels Number of consecutive periods: 1 Alarm severity: Major	A sharp increase in the number of channels may be a warning of a traffic increase.	The services may be abnormal. Check whether other alarms exist.
Instance Disk Usage	Alarm threshold: Raw data > 85% Number of consecutive periods: 1 Alarm severity: Critical	Large instance disk usage may be message accumulation.	See Solutions to Message Accumulation

Set the alarm threshold based on the service expectations. For example, if the expected usage is 35%, set the alarm threshold to 35%.
The number of consecutive periods and alarm severity can be adjusted based on the service logic.

Configuring RabbitMQ Alarms

Log in to the console.
In the upper left corner, click and select a region.

Select the region where your RabbitMQ instance is.
Click and choose Middleware > Distributed Message Service for RabbitMQ to open the console of DMS for RabbitMQ.
View the instance metrics using either of the following methods:
- In the row containing the desired instance, click View Metric. On the Cloud Eye console, view the metrics of the instance, nodes, and queues. Metric data is reported to Cloud Eye every minute.
- Click the desired RabbitMQ instance to view its details. In the navigation pane, choose Monitoring And Alarm > Monitoring. On the displayed page, view the metrics of the instance, nodes, and queues. Metric data is updated every minute.
Hover the mouse pointer over a metric and click to create an alarm rule for the metric.
Specify the alarm rule details.

For more information about creating alarm rules, see Creating an Alarm Rule.
1. Enter the alarm name and description.
2. Specify the alarm policy and alarm severity.
  For example, an alarm can be triggered and notifications can be sent once every day if the raw value of connections exceeds the preset value for three consecutive periods and no actions are taken to handle the exception.
3. Set Alarm Notification configurations. If you enable Alarm Notification, set the validity period, notification object, and trigger condition.
4. Click Create.