Updated on 2024-05-07 GMT+08:00

Monitoring Metrics

Description

This section describes metrics reported by GaussDB as well as their namespaces and dimensions.

Namespace

SYS.GAUSSDBV5

Metric Collection Constraints

  • Standby DNs of distributed instances: Metric data can be collected only when the instance version is 3.100.0 or later, and the transaction consistency must be eventual consistency.
  • Standby DNs of primary/standby instances: Metric data can be collected only when the instance version is 2.0.10 or later

Supported Metrics

The following table lists the performance metrics of GaussDB.

Table 1 Monitoring metrics supported by GaussDB

Metric ID

Metric

Description

Display Object

Unit

Monitored Object

Monitoring Period (Raw Data)

rds001_cpu_util

CPU Usage

CPU usage of the monitored object

Current node

%

Node

60s

rds002_mem_util

Memory Usage

Memory usage of the monitored object

Current node

%

Node

60s

rds003_bytes_in

Data Write Volume

Average number of bytes sent by the VM of the monitored object in a measurement period

Current node

Byte/s

Node

60s

rds004_bytes_out

Outgoing Data Volume

Average number of bytes received by the VM of the monitored object in a measurement period

Current node

Byte/s

Node

60s

rds014_iops

Disk IOPS

Real-time value of data disk reads and writes per second of the monitored node

Current node

Count/s

Node

60s

rds016_disk_write_throughput

Disk Write Throughput

Real-time write throughput per second of the data disk on the monitored node

Current node

Byte/s

Node

60s

rds017_disk_read_throughput

Disk Read Throughput

Real-time read throughput per second of the data disk on the monitored node

Current node

Byte/s

Node

60s

rds020_avg_disk_ms_per_write

Time Required for per Data Disk Write

Average time required for a data disk write on the monitored node

Current node

ms

Node

60s

rds021_avg_disk_ms_per_read

Time Required for per Data Disk Read

Average time required for a data disk read on the monitored node

Current node

ms

Node

60s

io_bandwidth_usage

Disk I/O Bandwidth Usage

Percentage of current disk I/O bandwidth

Current node

%

Node

60s

iops_usage

IOPS Usage

Percentage of used IOPS in the total IOPS

Current node

%

Node

60s

rds005_instance_disk_used_size

Used Instance Disk Size

Real-time used data disk size of the monitored instance

Instance

GB

Instance

60s

rds006_instance_disk_total_size

Total Instance Disk Size

Real-time total data disk size of the monitored instance

Instance

GB

Instance

60s

rds007_instance_disk_usage

Instance Disk Usage

Real-time data disk usage of the monitored instance

Instance

%

Instance

60s

rds035_buffer_hit_ratio

Buffer Hit Rate

Buffer hit rate of the database

Instance

%

Instance

60s

rds036_deadlocks

Deadlocks

Incremental number of database transaction deadlocks

Instance

Count

Instance

60s

rds048_P80

Response Time of 80% SQL Statements

Real-time response time of 80% of database SQL statements

Instance

us

Instance

60s

rds049_P95

Response Time of 95% SQL Statements

Real-time response time of 95% of database SQL statements

Instance

us

Instance

60s

rds008_disk_used_size

Used Disk Size

Real-time used data disk size of the monitored node

Current node

GB

Component

60s

rds009_disk_total_size

Total Disk Size

Real-time total data disk size of the monitored node

Current node

GB

Component

60s

rds010_disk_usage

Disk Usage

Real-time data disk usage of the monitored node

Current node

%

Component

60s

rds024_current_sleep_time

Primary Node Flow Control Duration

Real-time host flow control duration on the monitored node

Distributed: standby DN

Primary/Standby: standby DN

s

Component

60s

rds025_current_rto

Standby Node RTO

Real-time Recovery Time Objective (RTO) of the primary/standby replication of the monitored node

Distributed: standby DN

Primary/Standby: standby DN

s

Component

60s

rds026_login_counter

User Logins per Second

Average number of logins per second

Distributed: all CNs

Primary/Standby: primary DN

Count/s

Component

60s

rds027_logout_counter

User Logouts per Second

Average number of logouts per second

Distributed: all CNs

Primary/Standby: primary DN

Count/s

Component

60s

rds028_standby_delay

Standby Node Redo Progress

Real-time redo progress of the standby node in a shard. It indicates the difference of the redo progress between the primary and standby nodes.

Distributed: standby DN

Primary/Standby: primary DN

Byte

Component

60s

rds030_wait_ratio

Lock Waiting Session Rate

Real-time rate of lock waiting sessions to active sessions

Distributed: all CNs + primary DN

Primary/Standby: all DNs

%

Component

60s

rds031_active_ratio

Active Session Rate

Real-time rate of active sessions to all sessions

Distributed: all CNs + primary DN

Primary/Standby: all DNs

%

Component

60s

rds034_inuse_counter

CN Connections

Real-time number of in-use connections in the CN connection pool

Distributed: all CNs

Primary/Standby: none

Count

Component

60s

rds037_commit_counter

User Committed Transactions per Second

Average number of transactions committed by users per second

Distributed: all CNs

Primary/Standby: primary DN

Count/s

Component

60s

rds038_rollback_counter

User Rollback Transactions per Second

Average number of transactions rolled back by users per second

Distributed: all CNs

Primary/Standby: primary DN

Count/s

Component

60s

rds039_bg_commit_counter

Background Committed Transactions per Second

Average number of transactions committed by the background per second

Distributed: all CNs

Primary/Standby: primary DN

Count/s

Component

60s

rds040_bg_rollback_counter

Background Rollback Transactions per Second

Average number of transactions rolled back by the background per second

Distributed: all CNs

Primary/Standby: primary DN

Count/s

Component

60s

rds041_resp_avg

Average Response Time of User Transactions

Average response time of user transactions

Distributed: all CNs

Primary/Standby: primary DN

us

Component

60s

rds042_rollback_ratio

User Transaction Rollback Rate

Average rate of user rollback transactions to all user committed and rolled back transactions

Distributed: all CNs

Primary/Standby: primary DN

%

Component

60s

rds043_bg_rollback_ratio

Background Transaction Rollback Rate

Average rate of background rollback transactions to all user committed and rolled back transactions

Distributed: all CNs

Primary/Standby: primary DN

%

Component

60s

rds044_ddl_count

Data Definition Language/s

Average number of DDLs of user loads at the query layer

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count/s

Component

60s

rds045_dml_count

Data Manipulation Language/s

Average number of DMLs of user loads at the query layer

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count/s

Component

60s

rds046_dcl_count

Data Control Language/s

Average number of DCLs of user loads at the query layer

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count/s

Component

60s

rds047_ddl_dcl_ratio

DDL and DCL Rate

Average rate of DDLs and DCLs to DDLs, DCLs, and DMLs of user loads at the query layer

Distributed: all CNs + all DNs

Primary/Standby: all DNs

%

Component

60s

rds050_ckpt_delay

Data Volume to Be Flushed to Disks

Real-time amount of data to be flushed to disks during synchronization

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Byte

Component

60s

rds051_phyrds

Physical Reads per Second

Average number of physical reads per second

Distributed: all CNs + primary DN

Primary/Standby: all DNs

Count/s

Component

60s

rds052_phywrts

Physical Writes per Second

Average number of physical writes per second

Distributed: all CNs + primary DN

Primary/Standby: all DNs

Count/s

Component

60s

rds053_online_session

Online Sessions

Number of real-time online sessions

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count

Component

60s

rds054_active_session

Active Sessions

Number of real-time active sessions

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Count

Component

60s

rds055_online_ratio

Online Session Rate

Real-time percentage of online sessions on a CN (of a distributed instance) or a primary DN (of a primary/standby instance)

Distributed: all CNs + primary DN

Primary/Standby: all DNs

%

Component

60s

rds060_long_running_transaction_exectime

Maximum Execution Duration of Database Transactions

Maximum execution duration of database transactions

Distributed: all CNs + primary DN

Primary/Standby: all DNs

s

Component

60s

rds066_replication_slot_wal_log_size

WAL Log Size in the Replication Slot

Real-time size of WAL logs reserved in the replication slot of a primary DN

Distributed: primary DN

Primary/Standby: all DNs

Byte

Component

60s

rds067_xlog_lsn

Xlog Rate

Real-time rate of Xlogs on CNs or primary DNs

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Byte/s

Component

60s

rds068_swap_used_ratio

Swap Memory Usage

Real-time swap memory usage of the OS

Current node

%

Node

60s

rds069_swap_total_size

Total Swap Memory

Real-time total swap memory size of the OS

Current node

MB

Node

60s

rds070_thread_pool

Thread Pool Usage

Real-time thread pool usage on a CN and DN

Distributed: all CNs + primary DN

Primary/Standby: all DNs

%

Component

60s

rds071_locks_session

Sessions Waiting for Locks

Number of sessions waiting for locks on a CN or primary DN. This metric is updated in real time

Distributed: all CNs + primary DN

Primary/Standby: all DNs

Count

Component

60s

rds072_streaming_dr_xlog_gap

Shard Log Gap of DR Cluster

Log difference between shards in the DR cluster and shards in the production cluster when streaming DR is enabled

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Byte

Component

60s

rds073_streaming_dr_xlog_to_be_replayed

Size of Shard Logs to Be Replayed in DR Cluster

Size of the logs to be replayed of each shard in the DR cluster when streaming DR is enabled

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Byte

Component

60s

rds074_streaming_dr_xlog_flushing_rate

Flushing Rate of Shard Logs in DR Cluster

Rate at which logs of each shard in the DR cluster are flushed to disk when streaming DR is enabled

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Byte/s

Component

60s

rds075_streaming_dr_xlog_replay_rate

Replay Rate of Shard Logs in DR Cluster

Rate at which logs of each shard in the DR cluster are replayed when streaming DR is enabled

Distributed: all CNs + primary DN

Primary/Standby: primary DN

Byte/s

Component

60s

rds076_streaming_dr_rpo

Shard RPO

Real-time RPO of each shard when streaming DR is enabled

Distributed: all CNs + primary DN

Primary/Standby: primary DN

s

Component

60s

rds077_streaming_dr_rto

Shard RTO

Real-time RTO of each shard when streaming DR is enabled

Distributed: all CNs + primary DN

Primary/Standby: primary DN

s

Component

60s

rds078_inactive_replication_slot

Inactive Replication Slots

Number of physical and logical replication slots that are inactive

Distributed: all CNs + primary DN

Primary/Standby: all DNs

Count

Component

60s

rds079_standy_not_replayed_log

Size of Read Replica Logs Not Replayed

Difference between the number of replayed read replica logs and the number of received read replica logs

Distributed: standby DN

Primary/Standby: standby DN

Byte

Component

60s

rds080_xlog_num

Xlogs

Real-time number of Xlogs in the data directory on a CN or DN

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count

Component

60s

rds081_xlog_size

Xlog Size

Real-time size of Xlogs in the data directory on a CN or DN

Distributed: all CNs + all DNs

Primary/Standby: all DNs

MB

Component

60s

rds064_dynamic_used_memory

Used Dynamic Memory

Real-time, used dynamic memory of the monitored object

Distributed: all CNs + all DNs

Primary/Standby: all DNs

MB

Component

60s

rds065_dynamic_used_memory_usage

Dynamic Memory Usage

Real-time, dynamic memory usage of the monitored object

Distributed: all CNs + all DNs

Primary/Standby: all DNs

%

Component

60s

rds061_idle_in_transaction_num

Idle Transactions

Real-time reporting of how many idle transactions there are for the monitored object

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count

Component

60s

rds062_slowquery_sys

Slow SQL Statements in the System Database

Real-time number of slow SQL statements in the system database on the primary DN or CN in a measurement period

Distributed: all CNs

Primary/Standby: primary DN

Count

Component

60s

rds063_slowquery_user

Slow SQL Statements in the User Database

Real-time number of slow SQL statements in the user database on the primary DN or CN in a measurement period

Distributed: all CNs

Primary/Standby: primary DN

Count

Component

60s

rds082_gaussv5_wait_session

Waiting Sessions

Real-time number of waiting sessions

Distributed: all CNs + standby DN

Primary/Standby: all DNs

Count

Component

60s

rds083_cn_temp_dir_size

CN Temporary Directory Size

Real-time size of the temporary directories under the data directory on a CN

Distributed: all CNs + standby DN

Primary/Standby: all DNs

MB

Component

60s

rds084_sys_database_size

System Database Size

Real-time PostgreSQL database size on the monitored instance

Current node

Byte

Node

60s

rds085_user_database_size

User Database Total Size

Real-time user database size on the monitored instance

Current node

Byte

Node

60s

rds086_select_distribution

SELECT Distribution

Real-time percentage of SELECT statements

Distributed: all CNs + all DNs

Primary/Standby: all DNs

%

Component

60s

rds087_update_distribution

UPDATE Distribution

Real-time percentage of UPDATE statements

Distributed: all CNs + all DNs

Primary/Standby: all DNs

%

Component

60s

rds088_insert_distribution

INSERT Distribution

Real-time percentage of INSERT statements

Distributed: all CNs + all DNs

Primary/Standby: all DNs

%

Component

60s

rds089_delete_distribution

DELETE Distribution

Real-time percentage of DELETE statements

Distributed: all CNs + all DNs

Primary/Standby: all DNs

%

Component

60s

rds090_gaussdbv5_statement

SQL Statements

Real-time number of SQL statements

Distributed: all CNs + all DNs

Primary/Standby: all DNs

Count

Component

60s

rds091_gaussv5_qps

Read Requests

Average number of read requests per second of a tenant in a specified period

Distributed: all CNs

Primary/Standby: all DNs

Count

Component

60s

rds092_gaussv5_tps_rt_insert

INSERT Request Response Time

Average response time for INSERT requests of a tenant in a specified period

Distributed: all CNs

Primary/Standby: all DNs

ms

Component

60s

rds093_gaussv5_tps_rt_update

UPDATE Request Response Time

Average response time for UPDATE requests of a tenant in a specified period

Distributed: all CNs

Primary/Standby: all DNs

ms

Component

60s

rds094_gaussv5_tps_rt_delete

DELETE Request Response Time

Average response time for DELETE requests of a tenant in a specified period

Distributed: all CNs

Primary/Standby: all DNs

ms

Component

60s

rds095_gaussv5_qps_rt

Read Request Response Time

Average response time for read requests of a tenant in a specified period

Distributed: all CNs

Primary/Standby: all DNs

ms

Component

60s

retrans_rate

Retransmission Ratio

Real-time retransmission ratio of TCP packets

Current node

%

Node

60s

rds096_process_used_memory

Process Used Memory

Real-time used memory by a CN or DN

Distributed: all CNs + all DNs

Primary/Standby: all DNs

MB

Component

60s

rds097_2pc_transaction_prepare

Oldest Two-Phase Commit Transaction Duration

Maximum duration of uncommitted transactions using two-phase commit

Primary/Standby: primary DN

s

Component

60s

rds098_dn_instance_status

DN Status

Real-time status of a DN. 1: a normal primary DN; 2: a normal standby DN; 3: a normal main standby DN; 4: a normal cascaded standby DN; 10: standby DN catching up with primary DN using Xlog files; 20: a properly connected standby DN with abnormal replication status; 21: a disconnected DN

Primary/Standby: all DNs

N/A

Component

60s

rds099_replication_slot_dir_size

Replication Slot Directory Size

Real-time size of the replication slot directory

Primary/Standby: all DNs

KB

Component

300s

rds100_standby_diff_redo_and_receive

Difference Between Redo and Receipt Positions on Standby Node

The difference (in bytes) between the redo position and data receipt position on the standby node. This metric is used to determine whether data inconsistency is caused by slow redo rate on the standby node or because the primary node has not sent redo data.

Distributed: standby DN

Primary/Standby: standby DN

Byte

Component

60s

rds101_online_distinct_client_addr_count

Online Clients

Number of online clients on each CN

Distributed: all CNs

Count

Component

60s

rds102_working_distinct_client_addr_count

Active Clients

Number of active client connections on each CN

Distributed: all CNs

Count

Component

60s

Dimensions

Table 2 Dimensions

Key

Value

gaussdbv5_instance_id

GaussDB instance

gaussdbv5_node_id

GaussDB node

gaussdbv5_component_id

GaussDB component