Configuring RocketMQ Alarms

This section describes the alarm policies of some metrics and how to configure them. In actual services, you are advised to configure alarm rules for metrics based on the following alarm policies.

Approach Upper Limit in the following table indicates whether the threshold is close to the upper limit of the performance supported by current resources. If the threshold is close to the upper limit and usage continues to rise, services may be abnormal.

**Table 1** RocketMQ instance metrics to configure alarm rules for
Metric Name	Normal Range	Alarm Policy	Approach Upper Limit	Metric Description and Alarm Handling Suggestions
Accumulated Messages	>=0	Alarm threshold: original value > 90% of the upper limit. The upper limit is customized. Number of consecutive periods: 1 Alarm severity: Major	Yes	Metric description: total number of accumulated messages in all consumer groups of the instance. Alarm handling: Delete idle consumer groups, if any. You can also accelerate message retrieval, for example, by increasing the number of consumers.
Disk Capacity Usage	0–100	Alarm threshold: Raw data > 85 Number of consecutive periods: 3 Alarm severity: Critical	Yes	Metric description: disk usage of the RocketMQ VM. Unit: % Handling suggestion: If an alarm is generated for this metric, the current instance specifications are insufficient to carry services. The storage space needs to be expanded by referring to Modifying Specifications.
CPU Usage	0–100	Alarm threshold: Raw data > 80 Number of consecutive periods: 3 Alarm severity: Major	Yes	Metric description: CPU usage of a standby RocketMQ VM. Alarm handling: Check whether the metric has been approaching or exceeding the alarm threshold for a long time. If yes, increase the number of brokers. by referring to Modifying Specifications.
Average Disk Read Time (only for RocketMQ 4.8.0)	>=0	Alarm threshold: Raw data > 20 Number of consecutive periods: 3 Alarm severity: Major	Yes	Metric description: read latency of the disk on the slave RocketMQ node. When the disk performance reaches the upper limit, the disk read and write latency increases. So does the RocketMQ production and consumption latency. Alarm handling: Check whether the metric has been approaching or exceeding the alarm threshold for a long time. If yes, increase the number of brokers. by referring to Modifying Specifications.
Average Disk Write Time (only for RocketMQ 4.8.0)	>=0	Alarm threshold: Raw data > 20 Number of consecutive periods: 3 Alarm severity: Major	Yes	Metric description: write latency of the disk on the slave RocketMQ node. When the disk performance reaches the upper limit, the disk read and write latency increases. So does the RocketMQ production and consumption latency. Alarm handling: Check whether the metric has been approaching or exceeding the alarm threshold for a long time. If yes, increase the number of brokers. by referring to Modifying Specifications.

Configuring RocketMQ Alarms

Log in to the RocketMQ console.
In the row containing the desired instance, click View Metric.

You are redirected to the metric monitoring page on the Cloud Eye console.
Hover the mouse pointer over a metric and click to create an alarm rule for the metric.

The Create Alarm Rule page is displayed.
Specify the alarm rule details.

For details, see Creating an Alarm Rule.
1. Enter the alarm name and description.
2. Specify the alarm policy and alarm severity.
  As shown in the following figure, an alarm is generated if Raw data of disk capacity usage exceeds 85% for three consecutive periods. If the alarm is not handled on time, it will notify.
  
  Figure 1 Setting the alarm policy and alarm severity
3. Set the alarm notification configurations. If you enable Alarm Notification, set the validity period, notification object, and trigger condition.
4. Click Create.

Parent Topic: Viewing Monitoring Metrics and Configuring Alarms

Previous topic: DMS for RocketMQ Metrics

Next topic: Viewing RocketMQ Audit Logs

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.