Help Center/ Relational Database Service_RDS for PostgreSQL/ Best Practices/ Suggestions on RDS for PostgreSQL Metric Alarm Configurations
Updated on 2025-09-04 GMT+08:00

Suggestions on RDS for PostgreSQL Metric Alarm Configurations

You can set alarm rules on Cloud Eye to customize the monitored objects and notification policies and keep track of the instance status. This topic describes how to configure RDS for PostgreSQL metric alarm rules.

Creating a Metric Alarm Rule

  1. Log in to the management console.
  2. Click in the upper left corner and select a region and a project.
  3. Click Service List. Under Management & Governance, click Cloud Eye.
  4. In the navigation pane on the left, choose Cloud Service Monitoring > Relational Database Service.

    Figure 1 Choosing a monitored object

  5. Locate the DB instance for which you want to create an alarm rule and click Create Alarm Rule in the Operation column.

    Figure 2 Creating an alarm rule

  6. On the displayed page, set parameters as required.

    Table 1 Alarm rule information

    Parameter

    Description

    Name

    Alarm rule name. The system generates a random name, which you can modify.

    Description

    Description about the rule.

    Method

    There are three options: Associate template, Use existing template, and Configure manually.

    NOTE:

    If you select Associate template, after the associated template is modified, the policies contained in this alarm rule to be created will be modified accordingly.

    You are advised to select Use existing template. The existing templates already contain three common alarm metrics: CPU usage, memory usage, and storage space usage.

    Template

    Select the template to be used.

    You can select a default alarm template or create a custom template.

    Alarm Policy

    Policy for triggering an alarm.

    Whether to trigger an alarm depends on whether the metric data in consecutive periods reaches the threshold. For example, Cloud Eye triggers an alarm if the average CPU usage of the monitored object is 80% or more for three consecutive 5-minute periods.

    NOTE:

    A maximum of 50 alarm policies can be added to an alarm rule. If any one of these alarm policies is met, an alarm is triggered.

    Alarm Severity

    The alarm severity can be Critical, Major, Minor, or Informational.

    Figure 3 Configuring alarm notification
    Table 2 Alarm notification

    Parameter

    Description

    Alarm Notification

    Whether to notify users when alarms are triggered. Notifications can be sent by email, text message, or HTTP/HTTPS message.

    Notification Recipient

    You can select a notification group or topic subscription as required.

    Notification Group

    Notification group the alarm notification is to be sent to.

    Notification Object

    Object the alarm notification is to be sent to. You can select the account contact or a topic.

    • The account contact is the mobile phone number and email address of the registered account.
    • A topic is used to publish messages and subscribe to notifications.

    Notification Window

    Cloud Eye sends notifications only within the notification window specified in the alarm rule.

    If Notification Window is set to 08:00-20:00, Cloud Eye sends notifications only within 08:00-20:00.

    Trigger Condition

    Condition for triggering an alarm notification. You can select Generated alarm (when an alarm is generated), Cleared alarm (when an alarm is cleared), or both.

    Enterprise Project

    Enterprise project that the alarm rule belongs to. Only users with the enterprise project permissions can view and manage the alarm rule.

    Tag

    A tag is a key-value pair. Tags identify cloud resources so that you can easily categorize and search for your resources.

  7. Click Create. The alarm rule is created.

    For details about how to create alarm rules, see Creating an Alarm Rule in the Cloud Eye User Guide.

Metric Alarm Configuration Suggestions

Table 3 Suggestions on RDS for PostgreSQL metric alarm configurations

Metric ID

Name

Metric Description

Threshold in Best Practices

Alarm Severity in Best Practices

Handling Suggestion

rds001_cpu_util

CPU Usage

CPU usage of the monitored object

Raw data > 80% for three consecutive periods

Major

  1. Rectify the fault by referring to Troubleshooting High CPU Usage.
  2. If the CPU usage remains high due to increased workloads, upgrade the instance specifications. For details, see Changing a DB Instance Class.

rds002_mem_util

Memory Usage

Memory usage of the monitored object

Raw data > 90% for three consecutive periods

Major

  1. Rectify the fault by referring to Troubleshooting High Memory Usage.
  2. If the memory usage remains high due to increased workloads, upgrade the instance specifications. For details, see Changing a DB Instance Class.

rds039_disk_util

Storage Space Usage

Storage space usage of the monitored object

Raw data > 80% for three consecutive periods

Major

  1. Rectify the fault by referring to Troubleshooting High Storage Space Usage.
  2. If the storage space usage remains high due to increased workloads, scale up the storage space. For details, see Scaling Storage Space.

rds045_oldest_replication_slot_lag

Oldest Replication Slot Lag

Lagging size of the most lagging replica in terms of WAL data received

Raw data > 20,480 MB for one period

Major

Rectify the fault by referring to Troubleshooting High Oldest Replication Slot Lag or Replication Lag.

rds046_replication_lag

Replication Lag

Replication lag

Raw data > 600s for three consecutive periods

Major

rds083_conn_usage

Connection Usage

Percent of used PostgreSQL connections to the total number of connections

Raw data > 80% for three consecutive periods

Major

  1. Evaluate the impact of increased connections on workloads and release unnecessary connections. For details, see What Do I Do If There Are Too Many Database Connections?
  2. Set the maximum number of connections to an appropriate value. For details, see What Is the Maximum Number of Connections to an RDS for PostgreSQL Instance?

active_connections

Active Connections

Number of active database connections

Raw data > [vCPUs x 2] for one period

Major

Rectify the fault by referring to Troubleshooting Abnormal Connections and Active Connections

oldest_transaction_duration

Oldest Active Transaction Duration

Length of time since the start of the transaction that has been active longer than any other current transaction

Set the threshold as required. Reference value: Raw data > 7,200,000 ms for one period

Major

Rectify the fault by referring to Troubleshooting Long-Running Transactions.

oldest_transaction_duration_2pc

Oldest Two-Phase Commit Transaction Duration

Length of time since the start of the transaction that has been prepared for two-phase commit longer than any other current transaction

Set the threshold as required. Reference value: Raw data > 7,200,000 ms for one period

Major

db_max_age

Maximum Database Age

Maximum age of the current database, which is the value of max(age(datfrozenxid)) in the pg_database table

Raw data > 1,000,000,000 for one period

Major

Rectify the fault by referring to Troubleshooting Database Age Increase Problem.

slow_sql_three_second

Number of SQL Statements Executed for More Than 3s

Number of slow SQL statements whose execution time is longer than 3s

This metric shows an instantaneous value at the collection time instead of an accumulated value within 1 minute.

Set the threshold as required. Reference value: Raw data > [vCPUs x 2] for one period

Major

Rectify the fault by referring to Troubleshooting SQL Statements That Have Been Executed for 3s or 5s.

slow_sql_five_second

Number of SQL Statements Executed for More Than 5s

Number of slow SQL statements whose execution time is longer than 5s

This metric shows an instantaneous value at the collection time instead of an accumulated value within 1 minute.

Set the threshold as required. Reference value: Raw data > [vCPUs x 2] for one period

Major

inactive_logical_replication_slot

Inactive Logical Replication Slots

Number of inactive logical replication slots

Raw data > 1 for three consecutive periods

Major

Rectify the fault by referring to Troubleshooting Inactive Logical Replication Slots.