Help Center/ Cloud Eye/ FAQs/ Product Usage/ Server Monitoring/ What Metrics Are Supported by the Agent?
Updated on 2025-08-01 GMT+08:00

What Metrics Are Supported by the Agent?

OS metric: CPU

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

cpu_usage

(Agent) CPU Usage

Used to monitor CPU usage

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) value.
  • Collection method (Windows): Obtain the metric value using the API GetSystemTimes.

0-100

%

N/A

2.4.1

1 minute

cpu_usage_idle

(Agent) Idle CPU Usage

Percentage of the time that CPU is idle

Unit: Percent

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period.
  • Collection method (Windows): Obtain the metric value using the API GetSystemTimes.

0-100

%

N/A

2.4.5

1 minute

cpu_usage_other

(Agent) Other Process CPU Usage

Other CPU usage of the monitored object

  • Collection method (Linux): Other Process CPU Usage = 1– Idle CPU UsageKernel Space CPU UsageUser Space CPU Usage
  • Collection method (Windows): Other Process CPU Usage = 1– Idle CPU UsageKernel Space CPU UsageUser Space CPU Usage

0-100

%

N/A

2.4.5

1 minute

cpu_usage_system

(Agent) Kernel Space CPU Usage

Percentage of time that the CPU is used by kernel space

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) sy value.
  • Collection method (Windows): Obtain the metric value using the API GetSystemTimes.

0-100

%

N/A

2.4.5

1 minute

cpu_usage_user

(Agent) User Space CPU Usage

Percentage of time that the CPU is used by user space

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) us value.
  • Collection method (Windows): Obtain the metric value using the API GetSystemTimes.

0-100

%

N/A

2.4.5

1 minute

cpu_usage_nice

(Agent) Nice Process CPU Usage

Percentage of the time that the CPU is in user mode with low-priority processes which can easily be interrupted by higher-priority processes

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) ni value.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

cpu_usage_iowait

(Agent) iowait Process CPU Usage

Percentage of time that the CPU is waiting for I/O operations to complete

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) wa value.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

cpu_usage_irq

(Agent) CPU Interrupt Time

Percentage of time that the CPU is servicing interrupts

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) hi value.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

cpu_usage_softirq

(Agent) CPU Software Interrupt Time

Percentage of time that the CPU is servicing software interrupts

  • Collection method (Linux): Check the metric value changes in file /proc/stat in a collection period. You can run the top command to check the %Cpu(s) si value.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

OS Metric: CPU Load

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

load_average1

(Agent) 1-Minute Load Average

CPU load averaged from the last 1 minute

  • Collection method (Linux): Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. You can run the top command to check the load1 value.

≥0

None

N/A

2.4.1

1 minute

load_average5

(Agent) 5-Minute Load Average

CPU load averaged from the last 5 minutes

  • Collection method (Linux): Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. You can run the top command to check the load5 value.

≥0

None

N/A

2.4.1

1 minute

load_average15

(Agent) 15-Minute Load Average

CPU load averaged from the last 15 minutes

  • Collection method (Linux): Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. You can run the top command to check the load15 value.

≥0

None

N/A

2.4.1

1 minute

OS Metric: Memory

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

mem_available

(Agent) Available Memory

Amount of memory that is available and can be given instantly to processes

  • Collection method (Linux): Obtain the metric value from /proc/meminfo.

    If MemAvailable is displayed in /proc/meminfo, obtain the value.

    If MemAvailable is not displayed in /proc/meminfo, MemAvailable = MemFree + Buffers+Cached

  • Collection method (Windows): formula (Available memory – Used memory) The value is obtained by calling the Windows API GlobalMemoryStatusEx.

≥0

GB

N/A

2.4.5

1 minute

mem_usedPercent

(Agent) Memory Usage

Memory usage of the instance

  • Collection method (Linux): Obtain the metric value from the /proc/meminfo file (MemTotal-MemAvailable)/MemTotal.

    If MemAvailable is displayed in /proc/meminfo, MemUsedPercent = (MemTotal-MemAvailable)/MemTotal

    If MemAvailable is not displayed in /proc/meminfo, MemUsedPercent = (MemTotalMemFreeBuffersCached)/MemTotal

  • Collection method (Windows): formula (Used memory size/Total memory size x 100%)

0-100

%

N/A

2.4.1

1 minute

mem_free

(Agent) Idle Memory

Amount of memory that is not being used

  • Linux: Obtain the metric value from /proc/meminfo.
  • Windows does not support this metric.

≥0

GB

N/A

2.4.5

1 minute

mem_buffers

(Agent) Buffer

Amount of memory that is being used for buffers

  • Collection method (Linux): Obtain the metric value from /proc/meminfo. You can run the top command to check the KiB Mem:buffers value.
  • Windows does not support this metric.

≥0

GB

N/A

2.4.5

1 minute

mem_cached

(Agent) Cache

Amount of memory that is being used for file caches

  • Collection method (Linux): Obtain the metric value from /proc/meminfo. You can run the top command to check the KiB Swap:cached Mem value.
  • Windows does not support this metric.

≥0

GB

N/A

2.4.5

1 minute

total_open_files

(Agent) Total File Handles

Total handles used by all processes

  • Collection method (Linux): Use the /proc/{pid}/fd file to summarize the handles used by all processes.
  • Windows does not support this metric.

≥0

None

N/A

2.4.5

1 minute

OS Metric: Disk

Currently, CES Agent can collect only physical disk metrics and does not support disks mounted using the network file system protocol.

By default, CES Agent will not monitor Docker-related mount points. The prefix of the mount point is as follows:

/var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

disk_free

(Agent) Available Disk Space

Free space on the disks

  • Collection method (Linux): Run the df -h command to check the value in the Avail column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Collection method (Windows): Use the Windows Management Instrumentation (WMI) API GetDiskFreeSpaceExW to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥0

GB

N/A

2.4.1

1 minute

disk_total

(Agent) Disk Storage Capacity

Total disk capacity

  • Collection method (Linux): Run the df -h command to check the value in the Size column.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Collection method (Windows): Use the WMI API GetDiskFreeSpaceExW to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥0

GB

N/A

2.4.5

1 minute

disk_used

(Agent) Used Disk Space

Disk's used space

  • Collection method (Linux): Run the df -h command to check the value in the Used column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Collection method (Windows): Use the WMI API GetDiskFreeSpaceExW to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥0

GB

N/A

2.4.5

1 minute

disk_usedPercent

(Agent) Disk Usage

Percentage of used disk space. It is calculated as follows: Disk Usage = Used Disk Space/Disk Storage Capacity.

  • Collection method (Linux): It is calculated as follows: Used/Size. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Collection method (Windows): Use the WMI API GetDiskFreeSpaceExW to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

0-100

%

N/A

2.4.1

1 minute

OS Metric: Disk I/O

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

disk_agt_read_bytes_rate

(Agent) Disks Read Rate

Volume of data read from the instance per second

  • Collection method (Linux):

    Calculate the data changes in the sixth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Collection method (Windows):

    Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in WMI to obtain disk I/O data.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

    When the CPU usage is high, monitoring data obtaining timeout may occur and monitoring data cannot be obtained.

≥ 0

byte/s

1024(IEC)

2.4.5

1 minute

disk_agt_read_requests_rate

(Agent) Disks Read Requests

Number of read requests sent to the monitored disk per second

  • Collection method (Linux):

    The disk read requests are calculated by calculating the data changes in the fourth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Collection method (Windows):

    Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in WMI to obtain disk I/O data.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

    When the CPU usage is high, monitoring data obtaining timeout may occur and monitoring data cannot be obtained.

≥ 0

Request/s

N/A

2.4.5

1 minute

disk_agt_write_bytes_rate

(Agent) Disks Write Rate

Volume of data written to the instance per second

  • Collection method (Linux):

    The disk write rate is calculated by calculating the data changes in the tenth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Collection method (Windows):

    Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in WMI to obtain disk I/O data.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

    When the CPU usage is high, monitoring data obtaining timeout may occur and monitoring data cannot be obtained.

≥ 0

byte/s

1024(IEC)

2.4.5

1 minute

disk_agt_write_requests_rate

(Agent) Disks Write Requests

Number of write requests sent to the monitored disk per second

  • Collection method (Linux):

    The disk write requests are calculated by calculating the data changes in the eighth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Collection method (Windows):

    Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in WMI to obtain disk I/O data.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

    When the CPU usage is high, monitoring data obtaining timeout may occur and monitoring data cannot be obtained.

≥ 0

Request/s

N/A

2.4.5

1 minute

disk_readTime

(Agent) Average Read Request Time

The average time taken for disk read operations

  • Collection method (Linux):

    The average read request time is calculated by calculating the data changes in the seventh column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

≥ 0

ms/count

N/A

2.4.5

1 minute

disk_writeTime

(Agent) Average Write Request Time

The average time taken for disk write operations

  • Collection method (Linux):

    The average write request time is calculated by calculating the data changes in the eleventh column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

≥ 0

ms/count

N/A

2.4.5

1 minute

disk_ioUtils

(Agent) Disk I/O Usage

Percentage of the time that the disk has had I/O requests queued to the total disk operation time

  • Collection method (Linux):

    The disk I/O usage is calculated by calculating the data changes in the thirteenth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

0-100

%

N/A

2.4.1

1 minute

disk_queue_length

(Agent) Disk Queue Length

Average number of read or write requests queued up for completion for the monitored disk in the monitoring period

  • Collection method (Linux):

    The average disk queue length is calculated by calculating the data changes in the fourteenth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

≥ 0

count

N/A

2.4.5

1 minute

disk_write_bytes_per_operation

(Agent) Average Disk Write Size

Average number of bytes in an I/O write for the monitored disk in the monitoring period

  • Collection method (Linux):

    The average disk write size is calculated by calculating the data changes in the tenth column of the corresponding device to divide that of the eighth column in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

≥ 0

Byte/op

N/A

2.4.5

1 minute

disk_read_bytes_per_operation

(Agent) Average Disk Read Size

Average number of bytes in an I/O read for the monitored disk in the monitoring period

  • Collection method (Linux):

    The average disk read size is calculated by using the data changes in the sixth column of the corresponding device to divide that of the fourth column in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

≥ 0

Byte/op

N/A

2.4.5

1 minute

disk_io_svctm

(Agent) Disk I/O Service Time

Average time in an I/O read or write for the monitored disk in the monitoring period

  • Collection method (Linux):

    The average disk I/O service time is calculated by using the data changes in the thirteenth column of the corresponding device to divide the sum of data changes in the fourth and eighth columns in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows does not support this metric.

≥ 0

ms/op

N/A

2.4.5

1 minute

disk_device_used_percent

Block Device Usage

Percentage of total disk space that is used. The calculation formula is as follows: Used storage space of all mounted disk partitions/Total disk storage space.

  • Collection method (Linux): Summarize the disk usage of each mount point, calculate the total disk size based on the disk sector size and number of sectors, and calculate the overall disk usage.
  • Currently, Windows does not support this metric.

0-100

%

N/A

2.5.6

1 minute

OS Metric: File System

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

disk_fs_rwstate

(Agent) File System Read/Write Status

Read and write status of the mounted file system of the monitored object Possible statuses are 0 (read and write) and 1 (read only).

  • Collection method (Linux): Check file system information in the fourth column in file /proc/mounts.
  • Windows does not support this metric.
  • 0: readable and writable
  • 1: read-only

None

N/A

2.4.5

1 minute

disk_inodesTotal

(Agent) Disk inode Total

Total number of index nodes on the disk

  • Collection method (Linux): Run the df -i command to check the value in the Inodes column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows does not support this metric.

≥ 0

None

N/A

2.4.5

1 minute

disk_inodesUsed

(Agent) Total inode Used

Number of used index nodes on the disk

  • Collection method (Linux): Run the df -i command to check the value in the IUsed column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows does not support this metric.

≥ 0

None

N/A

2.4.5

1 minute

disk_inodesUsedPercent

(Agent) Percentage of Total inode Used

Number of used index nodes on the disk

  • Collection method (Linux): Run the df -i command to check the value in the IUse% column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows does not support this metric.

0-100

%

N/A

2.4.1

1 minute

OS Metric: TCP

Metric

Metric

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

net_tcp_total

(Agent) Total Number of TCP Connections

Total number of TCP connections

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.1

1 minute

net_tcp_established

(Agent) Number of connections in the ESTABLISHED state

Number of TCP connections in the ESTABLISHED state

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.1

1 minute

net_tcp_sys_sent

(Agent) Number of connections in the TCP SYS_SENT state.

Number of TCP connections that are being requested by the client

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_sys_recv

(Agent) Number of connections in the TCP SYS_RECV state.

Number of pending TCP connections received by the server

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_fin_wait1

(Agent) Number of TCP connections in the FIN_WAIT1 state.

Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_fin_wait2

(Agent) Number of TCP connections in the FIN_WAIT2 state.

Number of TCP connections in the FIN_WAIT2 state

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_time_wait

(Agent) Number of TCP connections in the TIME_WAIT state.

Number of TCP connections in the TIME_WAIT state

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_close

(Agent) Number of TCP connections in the CLOSE state.

Number of closed TCP connections

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_close_wait

(Agent) Number of TCP connections in the CLOSE_WAIT state.

Number of TCP connections in the CLOSE_WAIT state

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_last_ack

(Agent) Number of TCP connections in the LAST_ACK state.

Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_listen

(Agent) Number of TCP connections in the LISTEN state.

Number of TCP connections in the LISTEN state

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_closing

(Agent) Number of TCP connections in the CLOSING state.

Number of TCP connections to be automatically closed by the server and the client at the same time

  • Collection method (Linux): Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Collection method (Windows): Obtain the metric value using the GetTcpTable2 API.

≥ 0

count

N/A

2.4.5

1 minute

net_tcp_retrans

(Agent) TCP Retransmission Rate

Percentage of packets that are resent

  • Collection method (Linux): Obtain the metric value from the /proc/net/snmp file. The value is the ratio of the number of sent packets to the number of retransmitted packages in a collection period.
  • Collection method (Windows): Obtain the metric value using the GetTcpStatistics API.

0-100

%

N/A

2.4.5

1 minute

OS Metric: NIC

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

net_bitRecv

(Agent) Outbound Bandwidth

Number of bits sent by this NIC per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Collection method (Windows): Use the MibIfRow object in WMI to obtain network metric data.

≥ 0

bit/s

1024(IEC)

2.4.1

1 minute

net_bitSent

(Agent) Inbound Bandwidth

Number of bits received by this NIC per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Collection method (Windows): Use the MibIfRow object in WMI to obtain network metric data.

≥ 0

bit/s

1024(IEC)

2.4.1

1 minute

net_packetRecv

(Agent) NIC Packet Receive Rate

Number of packets received by this NIC per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Collection method (Windows): Use the MibIfRow object in WMI to obtain network metric data.

≥ 0

Count/s

N/A

2.4.1

1 minute

net_packetSent

(Agent) NIC Packet Send Rate

Number of packets sent by this NIC per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Collection method (Windows): Use the MibIfRow object in WMI to obtain network metric data.

≥ 0

Count/s

N/A

2.4.1

1 minute

net_errin

(Agent) Receive Error Rate

Percentage of receive errors detected by this NIC per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

net_errout

(Agent) Transmit Error Rate

Percentage of transmit errors detected by this NIC per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

net_dropin

(Agent) Received Packet Drop Rate

Percentage of packets received by this NIC which were dropped per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

net_dropout

(Agent) Transmitted Packet Drop Rate

Percentage of packets transmitted by this NIC which were dropped per second

  • Collection method (Linux): Check metric value changes in file /proc/net/dev in a collection period.
  • Windows does not support this metric.

0-100

%

N/A

2.4.5

1 minute

Process Monitoring Metrics

Metric

Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Monitoring Period (Raw Data)

proc_pHashId_cpu

(Agent) CPU Usage

CPU consumed by a process. pHashId (process name and process ID) is the value of md5.

  • Collection method (Linux): Check the metric value changes in file /proc/pid/stat.
  • Collection method (Windows): Call the Windows API GetProcessTimes to obtain the CPU usage of the process.

0–1 x Number of vCPUs

%

N/A

2.4.1

1 minute

proc_pHashId_mem

(Agent) Memory Usage

Memory consumed by a process. pHashId (process name and process ID) is the value of md5.

  • Collection method (Linux):

    RSS*PAGESIZE/MemTotal

    Obtain the RSS value by checking the second column of file /proc/pid/statm.

    Obtain the PAGESIZE value by running the getconf PAGESIZE command.

    Obtain the MemTotal value by checking file /proc/meminfo.

  • Collection method (Windows): Call the Windows API procGlobalMemoryStatusEx to obtain the total memory size. Call GetProcessMemoryInfo to obtain the used memory size. Use the used memory size to divide the total memory size to get the memory usage.

0-100

%

N/A

2.4.1

1 minute

proc_pHashId_file

(Agent) Number of opened files

Number of files opened by a process. pHashId (process name and process ID) is the value of md5.

  • Collection method (Linux): Run the ls -l /proc/pid/fd command to view the number of opened files.
  • Windows does not support this metric.

≥0

Count

N/A

2.4.1

1 minute

proc_running_count

(Agent) Number of running processes

Number of processes that are running

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows does not support this metric.

≥0

None

N/A

2.4.1

1 minute

proc_idle_count

(Agent) Idle Processes

Number of processes that are idle

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows does not support this metric.

≥0

None

N/A

2.4.1

1 minute

proc_zombie_count

(Agent) Zombie Processes

Number of zombie processes

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows does not support this metric.

≥0

None

N/A

2.4.1

1 minute

proc_blocked_count

(Agent) Blocked Processes

Number of processes that are blocked

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows does not support this metric.

≥0

None

N/A

2.4.1

1 minute

proc_sleeping_count

(Agent) Sleeping Processes

Number of processes that are sleeping

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows does not support this metric.

≥0

None

N/A

2.4.1

1 minute

proc_total_count

(Agent) Total Processes

Total number of processes on the monitored object

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Collection method (Windows): Obtain the total number of processes by using the system process status support module psapi.dll.

≥0

None

N/A

2.4.1

1 minute

proc_specified_count

(Agent) Specified Processes

Number of specified processes

  • Collection method (Linux): You can obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Collection method (Windows): Obtain the total number of processes by using the system process status support module psapi.dll.

≥0

None

N/A

2.4.1

1 minute

GPU Specifications

If a GPU server has eight GPU cards and the PM mode is disabled, data may fail to be collected. You can enable the PM mode and restart the monitoring process.

Category

Metric Name

Description

Value Range

Unit

Conversion Rule

Supported Version

Collection Interval

GPU Specifications

gpu_status

GPU health status of the VM. This metric is a composite metric.

  • Possible causes: 1. The ECC exceeded the threshold. 2. The GPU memory address failed to be remapped. 3. The GPU card is in the rev ff state. 4. infoROM error. 5. There are pages to be isolated. 6. The remapped rows are incorrect. (For details, see the following detailed metrics.)
  • Collection method (Linux): Call APIs from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU status.
  • Collection method (Windows): Call APIs from the GPU driver library file nvml.dll to obtain the GPU status.
  • 0: healthy
  • 1: subhealthy
  • 2: faulty

None

N/A

2.4.5

1 minute

gpu_performance_state

Performance status of the GPU

  • P0-P15, P32
  • P0 indicates the maximum performance status. P15 indicates the minimum performance status. P32 indicates the unknown status.
  • Collection mode (Linux): Call the NvmlDeviceGetPerformanceState API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU performance level.
  • Collection method (Windows): Call the NvmlDeviceGetPerformanceState API from the GPU driver library file nvml.dll to obtain the GPU performance level.
  • P0-P15: P0 indicates the maximum performance status, and P15 indicates the minimum performance status.
  • P32 indicates the unknown status.

None

N/A

2.4.1

1 minute

gpu_power_draw

Power of the GPU.

  • If the power exceeds the maximum power or is an incorrect value, the GPU hardware may be faulty.
  • Collection method (Linux): Call the NvmlDeviceGetPowerUsage API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU power.
  • Collection method (Windows): Call the NvmlDeviceGetPowerUsage API from the GPU driver library file nvml.dll to obtain the GPU power.

≥ 0

W

N/A

2.4.5

1 minute

gpu_temperature

Temperature of the GPU.

  • If the temperature exceeds the maximum operating temperature threshold or is an incorrect value, the GPU hardware may be faulty.
  • Collection method (Linux): Call the NvmlDeviceGetTemperature API from the GPU driver library file file libnvidia-ml.so.1 to obtain the GPU temperature.
  • Collection method (Windows): Call the NvmlDeviceGetTemperature API from the GPU driver library file nvml.dll to obtain the GPU temperature.

≥ 0

°C

N/A

2.4.5

1 minute

gpu_usage_gpu

GPU computing power usage.

  • The GPU computing power usage is displayed in percentage. The value is an instantaneous value at the sampling point.
  • Collection method (Linux): Call the NvmlDeviceGetUtilizationRates API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU computing power usage.
  • Collection method (Windows): Call the NvmlDeviceGetUtilizationRates API from nvml.dll to obtain the GPU computing power usage.

0-100

%

N/A

2.4.1

1 minute

gpu_usage_mem

GPU memory usage.

  • The GPU memory usage is displayed in percentage. The value is an instantaneous value at the sampling point.
  • Collection method (Linux): Call the NvmlDeviceGetUtilizationRates API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU memory usage.
  • Collection method (Windows): Call the NvmlDeviceGetUtilizationRates API from nvml.dll to obtain the GPU memory usage.

0-100

%

N/A

2.4.1

1 minute

gpu_used_mem

GPU memory usage.

  • The GPU memory usage is displayed in percentage. The value is an instantaneous value at the sampling point.
  • Collection method (Linux): Call the NvmlDeviceGetMemoryInfo API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU memory usage.
  • Collection method (Windows): Call the NvmlDeviceGetMemoryInfo API from the GPU driver library file nvml.dll to obtain the GPU memory usage.

≥ 0

MB

N/A

2.4.5

1 minute

gpu_free_mem

Remaining GPU memory.

  • The idle GPU memory data is displayed.

  • Collection method (Linux): Call the NvmlDeviceGetMemoryInfo API from the GPU driver library file libnvidia-ml.so.1 to obtain the remaining GPU memory.
  • Collection method (Windows): Call the NvmlDeviceGetMemoryInfo API from nvml.dll to obtain the remaining GPU memory.

≥ 0

MB

N/A

2.4.5

1 minute

gpu_usage_encoder

GPU encoder usage.

  • The GPU encoder usage is displayed in percentage. The value is an instantaneous value at the sampling point.
  • Collection method (Linux): Call the NvmlDeviceGetEncoderUtilization API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU encoding capability usage.
  • Collection method (Windows): Call the NvmlDeviceGetEncoderUtilization API from nvml.dll to obtain the GPU encoding capability usage.

0-100

%

N/A

2.4.5

1 minute

gpu_usage_decoder

GPU decoder usage.

  • The GPU decoder usage is displayed in percentage. The value is an instantaneous value at the sampling point.
  • Collection method (Linux): Call the NvmlDeviceGetDecoderUtilization API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU decoding capability usage.
  • Collection method (Windows): Call the NvmlDeviceGetDecoderUtilization API from nvml.dll to obtain the GPU decoding capability usage.

0-100

%

N/A

2.4.5

1 minute

gpu_graphics_clocks

GPU graphics (shader) clock frequency.

  • Displays the GPU clock frequencies related to graphics performance. If no graphics capability is used, you can ignore it.
  • Collection method (Linux): Call the NvmlDeviceGetClockInfo API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU graphics clock frequency.
  • Collection method (Windows): Call the NvmlDeviceGetClockInfo API from the GPU driver library file nvml.dll to obtain the GPU graphics clock frequency.

≥ 0

MHz

N/A

2.4.5

1 minute

gpu_sm_clocks

Streaming processor clock frequency of the GPU.

  • Clock frequency for controlling the GPU memory running speed.
  • Collection method (Linux): Call the NvmlDeviceGetClockInfo API from the GPU driver library file file libnvidia-ml.so.1 to obtain the streaming processor clock frequency of the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetClockInfo API from the GPU driver library file file nvml.dll to obtain the streaming processor clock frequency of the GPU.

≥ 0

MHz

N/A

2.4.5

1 minute

gpu_mem_clocks

Memory clock frequency of the GPU.

  • Displays the clock frequency closely related to CUDA core computing of the GPU.
  • Collection method (Linux): Call the NvmlDeviceGetClockInfo API from the GPU driver library file libnvidia-ml.so.1 to obtain the GPU memory clock frequency.
  • Collection method (Windows): Call the NvmlDeviceGetClockInfo API from the GPU driver library file nvml.dll to obtain the GPU memory clock frequency.

≥ 0

MHz

N/A

2.4.5

1 minute

gpu_video_clocks

Video (including codec) clock frequency of the GPU.

  • Displays the codec clock frequency of the current GPU.
  • Collection method (Linux): Call the NvmlDeviceGetClockInfo API from the GPU driver library file libnvidia-ml.so.1 to obtain the video clock frequency of the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetClockInfo API from the GPU driver library file nvml.dll to obtain the GPU video clock frequency.

≥ 0

MHz

N/A

2.4.5

1 minute

gpu_tx_throughput_pci

Outbound bandwidth of the GPU.

  • Displays the amount of data sent by the GPU to the host via PCIe.
  • Collection method (Linux): Call the NvmlDeviceGetPcieThroughput API from libnvidia-ml.so.1 to obtain the outbound bandwidth of the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetPcieThroughput API from nvml.dll to obtain the outbound bandwidth of the GPU.

≥ 0

MByte/s

N/A

2.4.5

1 minute

gpu_rx_throughput_pci

Inbound bandwidth of the GPU.

  • Displays the amount of data sent by the host to the GPU via PCIe.
  • Collection method (Linux): Call the NvmlDeviceGetPcieThroughput API from libnvidia-ml.so.1 to obtain the inbound bandwidth of the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetPcieThroughput API from nvml.dll to obtain the inbound bandwidth of the GPU.

≥ 0

MByte/s

N/A

2.4.5

1 minute

gpu_volatile_correctable

Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.

  • Collection method (Linux): Call the NvmlDeviceGetPcieThroughput API from the GPU driver library file libnvidia-ml.so.1 to obtain the number of correctable ECC errors since the GPU is reset.
  • Collection method (Windows): Call the NvmlDeviceGetPcieThroughput API from the GPU driver library file nvml.dll to obtain the number of correctable ECC errors since the GPU is reset.

≥ 0

count

N/A

2.4.5

1 minute

gpu_volatile_uncorrectable

Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.

  • Collection method (Linux): Call the NvmlDeviceGetTotalEccErrors and NvmlDeviceGetMemoryErrorCounter APIs from the GPU driver library file libnvidia-ml.so.1 to obtain the number of uncorrectable ECC errors since the GPU is reset.
  • Collection method (Windows): Call the NvmlDeviceGetTotalEccErrors and NvmlDeviceGetMemoryErrorCounter APIs from the GPU driver library file nvml.dll to obtain the number of uncorrectable ECC errors since the GPU is reset.

≥ 0

count

N/A

2.4.5

1 minute

gpu_aggregate_correctable

Number of correctable ECC errors on the GPU.

  • Collection method (Linux): Call the NvmlDeviceGetTotalEccErrors and NvmlDeviceGetMemoryErrorCounter APIs from the GPU driver library file libnvidia-ml.so.1 to obtain the number of correctable ECC errors on the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetTotalEccErrors and NvmlDeviceGetMemoryErrorCounter APIs from the GPU driver library file nvml.dll to obtain the number of correctable ECC errors on the GPU.

≥ 0

count

N/A

2.4.5

1 minute

gpu_aggregate_uncorrectable

Number of uncorrectable ECC Errors on the GPU.

  • Collection method (Linux): Call the NvmlDeviceGetTotalEccErrors and NvmlDeviceGetMemoryErrorCounter APIs from the GPU driver library file libnvidia-ml.so.1 to obtain the number of uncorrectable ECC errors on the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetTotalEccErrors and NvmlDeviceGetMemoryErrorCounter APIs from the GPU driver library file nvml.dll to obtain the number of uncorrectable ECC errors on the GPU.

≥ 0

count

N/A

2.4.5

1 minute

gpu_retired_page_single_bit

Number of retired page single bit errors, which indicates the number of single-bit pages isolated by the GPU.

  • Collection method (Linux): Call the NvmlDeviceGetRetiredPages API from the GPU driver library file libnvidia-ml.so.1 to obtain the number of single-bit pages isolated by the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetRetiredPages API from the GPU driver library file nvml.dll to obtain the number of single-bit pages isolated by the GPU.

≥ 0

count

N/A

2.4.5

1 minute

gpu_retired_page_double_bit

Number of retired page double bit errors, which indicates the number of double-bit pages isolated by the GPU.

  • Collection method (Linux): Call the NvmlDeviceGetRetiredPages API from the GPU driver library file libnvidia-ml.so.1 to obtain the number of double-bit pages isolated by the GPU.
  • Collection method (Windows): Call the NvmlDeviceGetRetiredPages API from the GPU driver library file nvml.dll to obtain the number of double-bit pages isolated by the GPU.

≥ 0

count

N/A

2.4.5

1 minute

gpu_lnkcap_speed

Maximum speed supported by the PCIe link of the GPU.

  • Maximum data throughput capability of the GPU on the PCIe bus.
  • Collection method (Linux): Use lspci -d 10de: -vv | grep -i lnkcap to query the maximum speed supported by the PCIe link of the GPU.
  • Collection method (Windows): Use gwmi Win32_Bus -Filter 'DeviceID like "PCI%"').GetRelated('Win32_PnPEntity') to query the maximum speed supported by the PCIe link of the GPU.

≥ 0

GT/s

N/A

2.6.7

1 minute

gpu_lnkcap_width

Link width of the PCIe link.

  • Maximum number of PCIe lanes supported by the GPU.
  • Collection method (Linux): Use lspci -d 10de: -vv | grep -i lnksta to query the maximum speed supported by the PCIe link of the GPU.
  • Collection method (Windows): Use gwmi Win32_Bus -Filter 'DeviceID like "PCI%"').GetRelated('Win32_PnPEntity') to query the maximum speed supported by the PCIe link of the GPU.

≥ 0

count

N/A

2.6.7

1 minute

gpu_lnksta_speed

PCIe connection speed of the GPU.

  • Maximum PCIe link speed supported by the GPU.
  • Collection method (Linux): Use lspci -d 10de: -vv | grep -i lnkcap to query the PCIe connection speed of the GPU.
  • Collection method (Windows): Not supported.

≥ 0

GT/s

N/A

2.6.7

1 minute

gpu_lnksta_width

PCIe link width of the GPU.

  • Maximum number of lanes in the PCIe link supported by the GPU.
  • Collection method (Linux): Use lspci -d 10de: -vv | grep -i lnksta to query the PCIe link bandwidth of the GPU.
  • Collection method (Windows): Not supported.

≥ 0

count

N/A

2.6.7

1 minute

gpu_nvlink_number

Number of NVLink links of the GPU.

  • Number of NVLink links supported by the GPU. For example, A100 supports 12 NVLink links.
  • Collection method (Linux): Call the nvmlDeviceGetFieldValue API from the GPU driver library file libnvidia-ml.so.1 to obtain the number of NVLink links of the GPU.
  • Collection method (Windows): Not supported.

≥ 0

count

N/A

2.6.7

1 minute

gpu_nvlink_bandwidth

NVLink link width of the GPU.

  • Indicates the total bandwidth for data transmission used by the GPU.
  • Collection method (Linux): Call the nvmlDeviceGetFieldValue API from the GPU driver library file libnvidia-ml.so.1 to obtain the NVLink link width of the GPU.
  • Collection method (Windows): Not supported.

≥ 0

GB/s

N/A

2.6.7

1 minute