Help Center/ Elastic Cloud Server/ User Guide/ Monitoring Using Cloud Eye/ OS Monitoring Metrics Supported by ECSs with the Agent Installed
Updated on 2024-10-18 GMT+08:00

OS Monitoring Metrics Supported by ECSs with the Agent Installed

Description

OS monitoring provides system-level, proactive, and fine-grained monitoring. It requires the Agent to be installed on the ECSs to be monitored. This section describes OS monitoring metrics reported to Cloud Eye.

OS monitoring supports metrics about the CPU, CPU load, memory, disk, disk I/O, file system, GPU, NIC, NTP, and TCP.

After the Agent is installed, you can view monitoring metrics of ECSs running different OSs. Monitoring data is collected every 1 minute.

Namespace

AGT.ECS

OS Metrics: CPU

Table 1 CPU metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

cpu_usage

(Agent) CPU Usage

CPU usage of the monitored object

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) value.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

ECS

1 minute

cpu_usage_idle

(Agent) Idle CPU Usage

Percentage of time that CPU is idle

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

ECS

1 minute

cpu_usage_user

(Agent) User Space CPU Usage

Percentage of time that the CPU is used by user space

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) us value.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

ECS

1 minute

cpu_usage_system

(Agent) Kernel Space CPU Usage

Percentage of time that the CPU is used by kernel space

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) sy value.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

ECS

1 minute

cpu_usage_other

(Agent) Other Process CPU Usage

Percentage of time that the CPU is used by other processes

Unit: percent

  • Linux: Other Process CPU Usage = 1- Idle CPU Usage - Kernel Space CPU Usage - User Space CPU Usage
  • Windows: Other Process CPU Usage = 1- Idle CPU Usage - Kernel Space CPU Usage - User Space CPU Usage

0-100

ECS

1 minute

cpu_usage_nice

(Agent) Nice Process CPU Usage

Percentage of time that the CPU is in user mode with low-priority processes which can easily be interrupted by higher-priority processes

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) ni value.
  • Windows is not supported currently.

0-100

ECS

1 minute

cpu_usage_iowait

(Agent) iowait Process CPU Usage

Percentage of time that the CPU is waiting for I/O operations to complete

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) wa value.
  • Windows is not supported currently.

0-100

ECS

1 minute

cpu_usage_irq

(Agent) CPU Interrupt Time

Percentage of time that the CPU is servicing interrupts

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) hi value.
  • Windows is not supported currently.

0-100

ECS

1 minute

cpu_usage_softirq

(Agent) CPU Software Interrupt Time

Percentage of time that the CPU is servicing software interrupts

Unit: percent

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command to check the %Cpu(s) si value.
  • Windows is not supported currently.

0-100

ECS

1 minute

OS Metric: CPU Load

Table 2 CPU load metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

load_average1

(Agent) 1-Minute Load Average

CPU load averaged from the last 1 minute

Linux: Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. Run the top command to check the load1 value.

≥ 0

ECS

1 minute

load_average5

(Agent) 5-Minute Load Average

CPU load averaged from the last 5 minutes

Linux: Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. Run the top command to check the load5 value.

≥ 0

ECS

1 minute

load_average15

(Agent) 15-Minute Load Average

CPU load averaged from the last 15 minutes

Linux: Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. Run the top command to check the load15 value.

≥ 0

ECS

1 minute

The Windows OS does not support the CPU load metrics.

OS Metric: Memory

Table 3 Memory metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

mem_available

(Agent) Available Memory

Amount of memory that is available and can be given instantly to processes

Unit: GB

  • Linux: Obtain the metric value from /proc/meminfo.
    • If MemAvailable is displayed in /proc/meminfo, obtain the value.
    • If MemAvailable is not displayed in /proc/meminfo, MemAvailable = MemFree + Buffers+Cached
  • Windows: The metric value is calculated by available memory minuses used memory. The value is obtained by calling the Windows API GlobalMemoryStatusEx.

≥ 0

ECS

1 minute

mem_usedPercent

(Agent) Memory Usage

Memory usage of the monitored object

Unit: percent

  • Linux: Obtain the metric value from the /proc/meminfo file: (MemTotal - MemAvailable)/MemTotal
    • If MemAvailable is displayed in /proc/meminfo, MemUsedPercent = (MemTotal-MemAvailable)/MemTotal
    • If MemAvailable is not displayed in /proc/meminfo, MemUsedPercent = (MemTotalMemFreeBuffersCached)/MemTotal
  • Windows: The calculation formula is as follows: Used memory size/Total memory size*100%.

0-100

ECS

1 minute

mem_free

(Agent) Idle Memory

Amount of memory that is not being used

Unit: GB

  • Linux: Obtain the metric value from /proc/meminfo.
  • Windows is not supported currently.

≥ 0

ECS

1 minute

mem_buffers

(Agent) Buffer

Amount of memory that is being used for buffers

Unit: GB

  • Linux: Obtain the metric value from /proc/meminfo. Run the top command to check the KiB Mem:buffers value.
  • Windows is not supported currently.

≥ 0

ECS

1 minute

mem_cached

(Agent) Cache

Amount of memory that is being used for file caches

Unit: GB

  • Linux: Obtain the metric value from /proc/meminfo. Run the top command to check the KiB Swap:cached Mem value.
  • Windows is not supported currently.

≥ 0

ECS

1 minute

total_open_files

(Agent) Total File Handles

Total handles used by all processes

Unit: count

  • Linux: Use the /proc/{pid}/fd file to summarize the handles used by all processes.
  • Windows is not supported currently.

≥ 0

ECS

1 minute

OS Metric: Disk

  • Currently, only physical disks are monitored. The NFS-attached disks cannot be monitored.
  • By default, Docker-related mount points are shielded. The prefix of the mount point is as follows:
    /var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos
Table 4 Disk metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

disk_free

(Agent) Available Disk Space

Free space on the disks

Unit: GB

  • Linux: Run the df -h command to check the value in the Avail column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows: Use the WMI interface to call GetDiskFreeSpaceExW API to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥ 0

ECS - Mount point

1 minute

disk_total

(Agent) Disk Storage Capacity

Total space on the disks, including used and free

Unit: GB

  • Linux: Run the df -h command to check the value in the Size column.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows: Use the WMI interface to call GetDiskFreeSpaceExW API to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥ 0

ECS - Mount point

1 minute

disk_used

(Agent) Used Disk Space

Used space on the disks

Unit: GB

  • Linux: Run the df -h command to check the value in the Used column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows: Use the WMI interface to call GetDiskFreeSpaceExW API to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥ 0

ECS - Mount point

1 minute

disk_usedPercent

(Agent) Disk Usage

Percentage of total disk space that is used, which is calculated as follows: Disk Usage = Used Disk Space/Disk Storage Capacity

Unit: percent

  • Linux: It is calculated as follows: Used/Size. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows: Use the WMI interface to call GetDiskFreeSpaceExW API to obtain disk space data. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

0-100

ECS - Mount point

1 minute

OS Metric: Disk I/O

Table 5 Disk I/O metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

disk_agt_read_bytes_rate

(Agent) Disks Read Rate

Number of bytes read from the monitored disk per second

Unit: byte/s

  • Linux:

    The disk read rate is calculated based on the data changes in the sixth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows:
    • Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in the WMI to obtain disk I/O data.
    • The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • When the CPU usage is high, monitoring data obtaining timeout may occur and result in the failure of obtaining monitoring data.

≥ 0 bytes/s

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_agt_read_requests_rate

(Agent) Disks Read Requests

Number of read requests sent to the monitored disk per second

Unit: request/s

  • Linux:

    The disk read requests are calculated based on the data changes in the fourth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows:
    • Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in the WMI to obtain disk I/O data.
    • The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • When the CPU usage is high, monitoring data obtaining timeout may occur and result in the failure of obtaining monitoring data.

≥ 0 requests/s

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_agt_write_bytes_rate

(Agent) Disks Write Rate

Number of bytes written to the monitored disk per second

Unit: byte/s

  • Linux:

    The disk write rate is calculated based on the data changes in the tenth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows:
    • Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in the WMI to obtain disk I/O data.
    • The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • When the CPU usage is high, monitoring data obtaining timeout may occur and result in the failure of obtaining monitoring data.

≥ 0 bytes/s

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_agt_write_requests_rate

(Agent) Disks Write Requests

Number of write requests sent to the monitored disk per second

Unit: request/s

  • Linux:

    The disk write requests are calculated based on the data changes in the eighth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows:
    • Use Win32_PerfFormattedData_PerfDisk_LogicalDisk object in the WMI to obtain disk I/O data.
    • The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • When the CPU usage is high, monitoring data obtaining timeout may occur and result in the failure of obtaining monitoring data.

≥ 0 requests/s

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_readTime

(Agent) Average Read Request Time

Average amount of time that read requests have waited on the disks

Unit: ms/count

  • Linux:

    The average read request time is calculated based on the data changes in the seventh column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

≥ 0 ms/Count

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_writeTime

(Agent) Average Write Request Time

Average amount of time that write requests have waited on the disks

Unit: ms/count

  • Linux:

    The average write request time is calculated based on the data changes in the eleventh column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

≥ 0 ms/Count

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_ioUtils

(Agent) Disk I/O Usage

Percentage of the time that the disk has had I/O requests queued to the total disk operation time

Unit: percent

  • Linux:

    The disk I/O usage is calculated based on the data changes in the thirteenth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

0-100

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_queue_length

(Agent) Disk Queue Length

Average number of read or write requests queued up for completion for the monitored disk in the monitoring period

Unit: count

  • Linux:

    The average disk queue length is calculated based on the data changes in the fourteenth column of the corresponding device in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

≥ 0

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_write_bytes_per_operation

(Agent) Average Disk Write Size

Average number of bytes in an I/O write for the monitored disk in the monitoring period

Unit: byte/op

  • Linux:

    The average disk write size is calculated based on the data changes in the tenth column of the corresponding device to divide that of the eighth column in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

≥ 0 bytes/op

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_read_bytes_per_operation

(Agent) Average Disk Read Size

Average number of bytes in an I/O read for the monitored disk in the monitoring period

Unit: byte/op

  • Linux:

    The average disk read size is calculated based on the data changes in the sixth column of the corresponding device to divide that of the fourth column in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

≥ 0 bytes/op

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_io_svctm

(Agent) Disk I/O Service Time

Average time in an I/O read or write for the monitored disk in the monitoring period

Unit: ms/op

  • Linux:

    The average disk I/O service time is calculated based on the data changes in the thirteenth column of the corresponding device to divide the sum of data changes in the fourth and eighth columns in file /proc/diskstats in a collection period.

    The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows is not supported currently.

≥ 0

  • ECS - Disk
  • ECS - Mount point

1 minute

disk_device_used_percent

Block Device Usage

Percentage of the physical disk usage of the monitored object. Calculation formula: Used storage space of all mounted disk partitions/Total disk storage space

  • Collection method for Linux ECSs: Obtain the disk usage of each mount point, calculate the total disk storage space based on the disk sector size and the number of sectors, and then you can calculate the used storage space in total.
  • Windows ECSs do not support this metric.

0-100

ECS - Disk

1 minute

OS Metric: File System

Table 6 File system metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

disk_fs_rwstate

(Agent) File System Read/Write Status

Read and write status of the mounted file system of the monitored object Possible values are 0 (read and write) and 1 (read only).

Linux: Check file system information in the fourth column in file /proc/mounts.

  • 0: readable and writable
  • 1: read-only

ECS - Mount point

1

disk_inodesTotal

(Agent) Disk inode Total

Total number of index nodes on the disk

Linux: Run the df -i command to check the value in the Inodes column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥ 0

ECS - Mount point

1 minute

disk_inodesUsed

(Agent) Total inode Used

Number of used index nodes on the disk

Linux: Run the df -i command to check the value in the IUsed column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥ 0

ECS - Mount point

1 minute

disk_inodesUsedPercent

(Agent) Percentage of Total inode Used

Number of used index nodes on the disk

Unit: percent

Linux: Run the df -i command to check the value in the IUse% column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

0-100

ECS - Mount point

1 minute

The Windows OS does not support the file system metrics.

OS Metric: NIC

Table 7 NIC metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

net_bitRecv

(Agent) Outbound Bandwidth

Number of bits sent by this NIC per second

Unit: bit/s

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Use the MibIfRow object in the WMI to obtain network metric data.

≥ 0 bit/s

ECS

1 minute

net_bitSent

(Agent) Inbound Bandwidth

Number of bits received by this NIC per second

Unit: bit/s

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Use the MibIfRow object in the WMI to obtain network metric data.

≥ 0 bit/s

ECS

1 minute

net_packetRecv

(Agent) NIC Packet Receive Rate

Number of packets received by this NIC per second

Unit: count/s

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Use the MibIfRow object in the WMI to obtain network metric data.

≥ 0 Counts/s

ECS

1 minute

net_packetSent

(Agent) NIC Packet Send Rate

Number of packets sent by this NIC per second

Unit: count/s

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Use the MibIfRow object in the WMI to obtain network metric data.

≥ 0 Counts/s

ECS

1 minute

net_errin

(Agent) Receive Error Rate

Percentage of receive errors detected by this NIC per second

Unit: percent

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

ECS

1 minute

net_errout

(Agent) Transmit Error Rate

Percentage of transmit errors detected by this NIC per second

Unit: percent

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

ECS

1 minute

net_dropin

(Agent) Received Packet Drop Rate

Percentage of packets received by this NIC which were dropped per second

Unit: percent

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

ECS

1 minute

net_dropout

(Agent) Transmitted Packet Drop Rate

Percentage of packets transmitted by this NIC which were dropped per second

Unit: percent

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

ECS

1 minute

OS Metric: NTP

Table 8 NTP metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

ntp_offset

(Agent) NTP Offset

NTP offset of the monitored object

Unit: ms

Collection method for Linux ECSs: Run chronyc sources -v to obtain the offset.

≥ 0 ms

ECS

1 minute

OS Metric: TCP

Table 9 TCP metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

net_tcp_total

(Agent) TCP TOTAL

Total number of TCP connections in all states

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_established

(Agent) TCP ESTABLISHED

Number of TCP connections in ESTABLISHED state

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_sys_sent

(Agent) TCP SYS_SENT

Number of TCP connections that are being requested by the client

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_sys_recv

(Agent) TCP SYS_RECV

Number of pending TCP connections received by the server

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_fin_wait1

(Agent) TCP FIN_WAIT1

Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_fin_wait2

(Agent) TCP FIN_WAIT2

Number of TCP connections in the FIN_WAIT2 state

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_time_wait

(Agent) TCP TIME_WAIT

Number of TCP connections in TIME_WAIT state

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_close

(Agent) TCP CLOSE

Number of closed TCP connections

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_close_wait

(Agent) TCP CLOSE_WAIT

Number of TCP connections in CLOSE_WAIT TCP state

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_last_ack

(Agent) TCP LAST_ACK

Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_listen

(Agent) TCP LISTEN

Number of TCP connections in the LISTEN state

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_closing

(Agent) TCP CLOSING

Number of TCP connections to be automatically closed by the server and the client at the same time

Unit: count

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using WindowsAPI GetTcpTable2.

≥ 0

ECS

1 minute

net_tcp_retrans

(Agent) TCP Retransmission Rate

Percentage of packets that are resent

Unit: percent

  • Linux: Obtain the metric value from the /proc/net/snmp file. The value is the ratio of the number of sent packets to the number of retransmitted packages in a collection period.
  • Windows: Obtain the metric value using WindowsAPI GetTcpStatistics.

0-100

ECS

1 minute

OS Metric: GPU

Table 10 GPU metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

gpu_status

GPU Health Status

Overall measurement of the GPU health

Unit: none

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.
  • 0: The GPU is healthy.
  • 1: The GPU is subhealthy.
  • 2: The GPU is faulty.
  • ECS
  • ECS - GPU

1 minute

gpu_usage_encoder

Encoding Usage

Encoding capability usage on the GPU

Unit: percent

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

0-100

  • ECS
  • ECS - GPU

1 minute

gpu_usage_decoder

Decoding Usage

Decoding capability usage on the GPU

Unit: percent

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

0-100

  • ECS
  • ECS - GPU

1 minute

gpu_volatile_correctable

Volatile Correctable ECC Errors

Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.

Unit: count

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0

  • ECS
  • ECS - GPU

1 minute

gpu_volatile_uncorrectable

Volatile Uncorrectable ECC Errors

Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.

Unit: count

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0

  • ECS
  • ECS - GPU

1 minute

gpu_aggregate_correctable

Aggregate Correctable ECC Errors

Aggregate correctable ECC errors on the GPU

Unit: count

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0

  • ECS
  • ECS - GPU

1 minute

gpu_aggregate_uncorrectable

Aggregate Uncorrectable ECC Errors

Aggregate uncorrectable ECC Errors on the GPU

Unit: count

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0

  • ECS
  • ECS - GPU

1 minute

gpu_retired_page_single_bit

Retired Page Single Bit Errors

Number of retired page single bit errors, which indicates the number of single-bit pages blocked by the graphics card

Unit: count

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0

  • ECS
  • ECS - GPU

1 minute

gpu_retired_page_double_bit

Retired Page Double Bit Errors

Number of retired page double bit errors, which indicates the number of double-bit pages blocked by the graphics card

Unit: count

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0

  • ECS
  • ECS - GPU

1 minute

gpu_performance_state

(Agent) Performance Status

GPU performance of the monitored object

Unit: none

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

P0-P15, P32

  • P0: indicates the maximum performance status.
  • P15: indicates the minimum performance status.
  • P32: indicates the unknown performance status.
  • ECS
  • ECS - GPU

1 minute

gpu_usage_mem

(Agent) GPU Memory Usage

GPU memory usage of the monitored object

Unit: percent

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

0-100

  • ECS
  • ECS - GPU

1 minute

gpu_usage_gpu

(Agent) GPU Usage

GPU usage of the monitored object

Unit: percent

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

0-100

  • ECS
  • ECS - GPU

1 minute

gpu_free_mem

GPU Free Memory

Free Memory on the GPU

Unit: MB

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MB

  • ECS
  • ECS - GPU

1 minute

gpu_graphics_clocks

GPU Graphics Clocks

Current Graphics Clocks on the GPU

Unit: MHz

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MHz

  • ECS
  • ECS - GPU

1 minute

gpu_mem_clocks

GPU Memory Clocks

Current Memory Clocks on the GPU

Unit: MHz

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MHz

  • ECS
  • ECS - GPU

1 minute

gpu_power_draw

GPU Draw Power

Draw Power on the GPU

Unit: W

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

NA

  • ECS
  • ECS - GPU

1 minute

gpu_rx_throughput_pci

GPU PCI Rx Throughput

Current PCI Rx Throughput on the GPU

Unit: MByte/s

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MByte/s

  • ECS
  • ECS - GPU

1 minute

gpu_sm_clocks

GPU SM Clocks

Current SM Clocks on the GPU

Unit: MHz

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MHz

  • ECS
  • ECS - GPU

1 minute

gpu_temperature

GPU Temperature

Current Temperature on the GPU

Unit: °C

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 °C

  • ECS
  • ECS - GPU

1 minute

gpu_tx_throughput_pci

GPU PCI Tx Throughput

Current PCI Tx Throughput on the GPU

Unit: MByte/s

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MByte/s

  • ECS
  • ECS - GPU

1 minute

gpu_used_mem

GPU Used Memory

Memory Used on the GPU

Unit: MB

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MB

  • ECS
  • ECS - GPU

1 minute

gpu_video_clocks

GPU Video Clocks

Current Video Clocks on the GPU

Unit: MHz

  • Linux: Obtain the metric value using the libnvidia-ml.so.1 library file of the graphics card.
  • Windows: Obtain the metric value using the nvml.dll library of the graphics card.

≥ 0 MHz

  • ECS
  • ECS - GPU

1 minute

OS Metrics: NPU

Table 11 NPU metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

npu_device_health

NPU Device Health

An overall measurement of the GPU health

Unit: none

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

  • 0: healthy
  • 1: minor alarms
  • 2: major alarms
  • 3:critical alarms
  • ECS
  • ECS - NPU

1 minute

npu_util_rate_mem

NPU Util Rate Mem

The utilization rate of the NPU memory

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

0-100

  • ECS
  • ECS - NPU

1 minute

npu_util_rate_ai_core

NPU Util Rate AI Core

The utilization rate of the NPU AI Core

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

0-100

  • ECS
  • ECS - NPU

1 minute

npu_util_rate_ai_cpu

NPU Util Rate AI Cpu

The utilization rate of the NPU's AI CPU

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

0-100

  • ECS
  • ECS - NPU

1 minute

npu_util_rate_ctrl_cpu

NPU Util Rate Ctrl CPU

The utilization rate of the NPU's Control CPU

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

0-100

  • ECS
  • ECS - NPU

1 minute

npu_util_rate_mem_bandwidth

NPU Util Rate Mem Bandwidth

The utilization rate of the NPU memory bandwidth

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

0-100

  • ECS
  • ECS - NPU

1 minute

npu_freq_mem

NPU Freq Mem

Current Frequency(Clock) of the NPU memory

Unit: MHz

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

npu_freq_ai_core

NPU Freq AI Core

Current Frequency(Clock) of the NPU's AI Core

Unit: MHz

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

npu_usage_mem

NPU Usage Mem

Current used NPU memory

Unit: MB

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

npu_sbe

NPU SBE

Numbers of single bit error of the NPU

Unit: count

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

npu_dbe

NPU DBE

Numbers of double bit error of the NPU

Unit: count

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

npu_power

NPU Power

The power of the NPU (current power for 310P, rated power for 310)

Unit: W

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

npu_temperature

NPU temperature

Current temperature of the GPU

Unit: °C

Linux: Obtain the metric value from the libdcmi.so library file of the NPU card.

≥ 0

  • ECS
  • ECS - NPU

1 minute

The Windows OS does not support NPU metrics.

OS Metrics: DAVP

Table 12 DAVP metrics

Metric

Parameter

Description

Value Range

Monitored Object & Dimension

Monitoring Period (Raw Data)

davp_device_health

DAVP Device Health

An overall measurement of the DAVP health

Unit: none

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

  • 0: healthy
  • 1: abnormal
  • ECS
  • ECS - DAVP

1 minute

davp_util_rate_mem

DAVP Util Rate Mem

The utilization rate of the davp memory

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

0-100

  • ECS
  • ECS - DAVP

1 minute

davp_usage_mem

DAVP Usage Mem

Current used davp memory

Unit: MB

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

≥ 0

  • ECS
  • ECS - DAVP

1 minute

davp_util_rate_ai_core

DAVP Util Rate AI Core

The utilization rate of the DAVP AI Core

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

0-100

  • ECS
  • ECS - DAVP

1 minute

davp_util_rate_vdsp_core

DAVP Util Rate Vdsp Core

The utilization rate of the DAVP Vdsp Core

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

0-100

  • ECS
  • ECS - DAVP

1 minute

davp_util_rate_enc_core

DAVP Util Rate Enc Core

The utilization rate of the DAVP Enc Core

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

0-100

  • ECS
  • ECS - DAVP

1 minute

davp_util_rate_dec_core

DAVP Util Rate Dec Core

The utilization rate of the DAVP Dec Core

Unit: percent

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

0-100

  • ECS
  • ECS - DAVP

1 minute

davp_sysc_temperature

Davp System Module Temperature

Current system module temperature of davp

Unit: °C

Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card.

≥ 0

  • ECS
  • ECS - DAVP

1 minute

The Windows OS does not support DAVP metrics.

Dimensions

Dimension

Key

Value

ECS

instance_id

Specifies the ECS ID.

ECS - Disk

disk

Specifies the disks attached to an ECS.

ECS - Mount point

mount_point

Specifies the mount point of a disk.

ECS - GPU

gpu

Specifies the graphics card of an ECS.

ECS - NPU

npu

Specifies the NPU graphics card of an NPU-based ECS.

ECS - DAVP

davp

Specifies the DaoCloud DAVP1 video acceleration card of a DAVP-based ECS.