Configuring Alarm Rules for Critical Metrics
This section describes the alarm rules of some metrics and how to configure the rules. In actual scenarios, configure alarm rules for metrics by referring to the following alarm policies.
Alarm Policies for DCS Redis Instances
Metric |
Normal Range |
Alarm Policy |
Approach Upper Limit |
Handling Suggestion |
---|---|---|---|---|
CPU Usage |
0–100 |
Alarm threshold: 70 Number of consecutive periods: 2 Alarm severity: Major |
No |
Consider capacity expansion based on the service analysis. The CPU capacity of a single-node or master/standby instance cannot be expanded. If you need larger capacity, use a cluster instance instead. This metric is available only for Proxy Cluster, single-node, and master/standby instances. For Redis Cluster instances, this metric is available only on the Redis Server level. You can view the metric on the Redis Server tab page on the Performance Monitoring page of the instance. |
Average CPU Usage |
0–100% |
Alarm threshold: > 70% Number of consecutive periods: 2 Alarm severity: Major |
No |
Consider capacity expansion based on the service analysis. The CPU capacity of a single-node or master/standby instance cannot be expanded. If you need larger capacity, use a cluster instance instead. This metric is available only for single-node and master/standby instances. For cluster instances, this metric is available only on the Redis Server level. You can view the metric on the Redis Server tab page on the Performance Monitoring page of the instance. |
Memory Usage |
0–100 |
Alarm threshold: 70 Number of consecutive periods: 2 Alarm severity: Major |
No |
Expand the capacity of the instance. |
Connected Clients |
0–10,000 |
Alarm threshold: 8000 Number of consecutive periods: 2 Alarm severity: Major |
No |
Optimize the connection pool in the service code to prevent the number of connections from exceeding the maximum limit. For single-node and master/standby instances, the maximum number of connections allowed is 10,000. You can adjust the threshold based on service requirements. Configure this alarm policy on the instance level for single-node and master/standby instances. For cluster instances, configure this alarm policy on the Redis Server and Proxy level. |
New Connections (Count/min) |
0–10,000 |
Alarm threshold: 10,000 Number of consecutive periods: 2 Alarm severity: Minor |
- |
Check whether connect is used and whether the client connection is abnormal. Use persistent connections ("pconnect" in Redis terminology) to ensure performance. Configure this alarm policy on the instance level for single-node and master/standby instances. For cluster instances, configure this alarm policy on the Redis Server and Proxy level. |
Input Flow |
> 0 |
Alarm threshold: 80% of the assured bandwidth Number of consecutive periods: 2 Alarm severity: Major |
Yes |
Consider capacity expansion based on the service analysis and bandwidth limit. Configure this alarm only for single-node and master/standby DCS Redis 3.0 instances and set the alarm threshold to 80% of the assured bandwidth of DCS Redis 3.0 instances. |
Output Flow |
> 0 |
Alarm threshold: 80% of the assured bandwidth Number of consecutive periods: 2 Alarm severity: Major |
Yes |
Consider capacity expansion based on the service analysis and bandwidth limit. Configure this alarm only for single-node and master/standby DCS Redis 3.0 instances and set the alarm threshold to 80% of the assured bandwidth of DCS Redis 3.0 instances. |
Alarm Policies for DCS Memcached Instances
Metric |
Value Range |
Alarm Policy |
Approach Upper Limit |
Handling Suggestion |
---|---|---|---|---|
CPU Usage |
0–100% |
Alarm threshold: > 70% Number of consecutive periods: 2 Alarm severity: Major |
No |
Check the service for traffic surge. The CPU capacity of a single-node or master/standby instance cannot be expanded. Analyze the service and consider splitting the service or combine multiple instances into a cluster on the client end. |
Memory Usage |
0–100% |
Alarm threshold: > 65% Number of consecutive periods: 2 Alarm severity: Minor |
No |
Consider expanding the instance capacity. |
Connected Clients |
0–10,000 |
Alarm threshold: > 8000 Number of consecutive periods: 2 Alarm severity: Major |
No |
Optimize the connection pool in the service code to prevent the number of connections from exceeding the maximum limit. |
New Connections |
≥ 0 |
Alarm threshold: > 10,000 Number of consecutive periods: 2 Alarm severity: Minor |
- |
Check whether connect is used and whether the client connection is abnormal. Use persistent connections ("pconnect" in Redis terminology) to ensure performance. |
Input Flow |
≥ 0 |
Alarm threshold: > 80% of the assured bandwidth Number of consecutive periods: 2 Alarm severity: Major |
Yes |
Consider capacity expansion based on the service analysis and bandwidth limit. For details about the bandwidth of different instance specifications, see Memcached Instance Specifications. |
Output Flow |
≥ 0 |
Alarm threshold: > 80% of the assured bandwidth Number of consecutive periods: 2 Alarm severity: Major |
Yes |
Consider capacity expansion based on the service analysis and bandwidth limit. For details about the bandwidth of different instance specifications, see Memcached Instance Specifications. |
Authentication Failures |
≥ 0 |
Alarm threshold: > 0 Number of consecutive periods: 1 Alarm severity: Critical |
- |
Check whether the password is entered correctly. |
Alarm Policies for Redis Server Nodes of Cluster DCS Redis Instances
Metric |
Value Range |
Alarm Policy |
Approach Upper Limit |
Handling Suggestion |
---|---|---|---|---|
CPU Usage |
0–100% |
Alarm threshold: > 70% Number of consecutive periods: 2 Alarm severity: Major |
No |
Check the service for traffic surge. Check whether the CPU usage is evenly distributed to Redis Server nodes. If the CPU usage is high on multiple nodes, consider capacity expansion. Expanding the capacity of a cluster instance will scale out nodes to share the CPU pressure. If the CPU usage is high on a single node, check whether hot keys exist. If yes, optimize the service code to eliminate hot keys. |
Average CPU Usage |
0–100% |
Alarm threshold: > 70% Number of consecutive periods: 2 Alarm severity: Major |
No |
Consider capacity expansion based on the service analysis. The CPU capacity of a single-node or master/standby instance cannot be expanded. If you need larger capacity, use a cluster instance instead. This metric is available only for single-node, master/standby, and Proxy Cluster instances. For Redis Cluster instances, this metric is available only on the Redis Server level. You can view the metric on the Redis Server tab page on the Performance Monitoring page of the instance. |
Memory Usage |
0–100% |
Alarm threshold: > 70% Number of consecutive periods: 2 Alarm severity: Major |
No |
Check the service for traffic surge. Check whether the memory usage is evenly distributed to Redis Server nodes. If the memory usage is high on multiple nodes, consider capacity expansion. If the memory usage is high on a single node, check whether big keys exist. If yes, optimize the service code to eliminate big keys. |
Connected Clients |
0–10,000 |
Alarm threshold: > 8000 Number of consecutive periods: 2 Alarm severity: Major |
No |
Check whether the number of connections is within the appropriate range. If yes, adjust the alarm threshold. |
New Connections |
≥ 0 |
Alarm threshold: > 10,000 Number of consecutive periods: 2 Alarm severity: Minor |
- |
Check whether connect is used. To ensure performance, use persistent connections ("pconnect" in Redis terminology). |
Slow Query Logs |
0–1 |
Alarm threshold: > 0 Number of consecutive periods: 1 Alarm severity: Major |
- |
Use the slow query function on the console to analyze slow commands. |
Bandwidth Usage |
0–200% |
Alarm threshold: > 90% Number of consecutive periods: 2 Alarm severity: Major |
Yes |
Check whether the bandwidth usage increase comes from read services or write services based on the input and output flow. If the bandwidth usage of a single node is high, check whether big keys exist. Even if the bandwidth usage exceeds 100%, flow control may not necessarily be performed. The actual flow control is subject to the Flow Control Times metric. Even if the bandwidth usage is below 100%, flow control may be performed. The real-time bandwidth usage is reported once in every reporting period. The flow control times metric is reported every second. During a reporting period, the traffic may surge within seconds and then fall back. By the time the bandwidth usage is reported, it has restored to the normal level. |
Flow Control Times |
≥ 0 |
Alarm threshold: > 0 Number of consecutive periods: 1 Alarm severity: Critical |
Yes |
Consider capacity expansion based on the specification limits, input flow, and output flow. |
Procedure
In the following example, an alarm rule is set for the CPU Usage metric.
- Log in to the DCS console.
- Click in the upper left corner and select a region and a project.
- In the navigation pane, choose Cache Manager.
- In the same row as the DCS instance whose metrics you want to view, choose More > View Metric.
- Locate the CPU Usage metric. Hover over the metric and click to create an alarm rule for the metric.
The Create Alarm Rule page is displayed.
- Specify the alarm rule details.
- Specify the alarm policy and alarm severity.
- Set the alarm notification configurations. If you enable Alarm Notification, set the validity period, notification object, and trigger condition.
- Click Create.
- For more information about creating alarm rules, see the Cloud Eye User Guide > Using the Alarm Function > Creating Alarm Rules.
- For details about how to modify or disable the created alarms, see "Using the Alarm Function" > "Alarm Rule Management" in the Cloud Eye User Guide.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot