OS Monitoring Metrics Supported by ECSs with the Agent Installed
Description
OS monitoring provides system-level, proactive, and fine-grained monitoring. It requires the Agent to be installed on the ECSs to be monitored. This section describes OS monitoring metrics reported to Cloud Eye.
OS monitoring supports metrics about the CPU, CPU load, memory, disk, disk I/O, file system, GPU, NIC, NTP, and TCP.
After the Agent is installed, you can view monitoring metrics of ECSs running different OSs. Monitoring data is collected every 1 minute.
Namespace
AGT.ECS
OS Metrics: CPU
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
cpu_usage |
(Agent) CPU Usage |
CPU usage of the monitored object Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_idle |
(Agent) Idle CPU Usage |
Percentage of time that CPU is idle Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_user |
(Agent) User Space CPU Usage |
Percentage of time that the CPU is used by user space Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_system |
(Agent) Kernel Space CPU Usage |
Percentage of time that the CPU is used by kernel space Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_other |
(Agent) Other Process CPU Usage |
Percentage of time that the CPU is used by other processes Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_nice |
(Agent) Nice Process CPU Usage |
Percentage of time that the CPU is in user mode with low-priority processes which can easily be interrupted by higher-priority processes Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_iowait |
(Agent) iowait Process CPU Usage |
Percentage of time that the CPU is waiting for I/O operations to complete Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_irq |
(Agent) CPU Interrupt Time |
Percentage of time that the CPU is servicing interrupts Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_softirq |
(Agent) CPU Software Interrupt Time |
Percentage of time that the CPU is servicing software interrupts Unit: percent
|
0-100 |
ECS |
1 minute |
OS Metric: CPU Load
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
load_average1 |
(Agent) 1-Minute Load Average |
CPU load averaged from the last 1 minute Linux: Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. Run the top command to check the load1 value. |
≥ 0 |
ECS |
1 minute |
load_average5 |
(Agent) 5-Minute Load Average |
CPU load averaged from the last 5 minutes Linux: Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. Run the top command to check the load5 value. |
≥ 0 |
ECS |
1 minute |
load_average15 |
(Agent) 15-Minute Load Average |
CPU load averaged from the last 15 minutes Linux: Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. Run the top command to check the load15 value. |
≥ 0 |
ECS |
1 minute |
The Windows OS does not support the CPU load metrics.
OS Metric: Memory
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
mem_available |
(Agent) Available Memory |
Amount of memory that is available and can be given instantly to processes Unit: GB
|
≥ 0 |
ECS |
1 minute |
mem_usedPercent |
(Agent) Memory Usage |
Memory usage of the monitored object Unit: percent
|
0-100 |
ECS |
1 minute |
mem_free |
(Agent) Idle Memory |
Amount of memory that is not being used Unit: GB
|
≥ 0 |
ECS |
1 minute |
mem_buffers |
(Agent) Buffer |
Amount of memory that is being used for buffers Unit: GB
|
≥ 0 |
ECS |
1 minute |
mem_cached |
(Agent) Cache |
Amount of memory that is being used for file caches Unit: GB
|
≥ 0 |
ECS |
1 minute |
total_open_files |
(Agent) Total File Handles |
Total handles used by all processes Unit: count
|
≥ 0 |
ECS |
1 minute |
OS Metric: Disk
- Currently, only physical disks are monitored. The NFS-attached disks cannot be monitored.
- By default, Docker-related mount points are shielded. The prefix of the mount point is as follows:
/var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
disk_free |
(Agent) Available Disk Space |
Free space on the disks Unit: GB
|
≥ 0 |
ECS - Mount point |
1 minute |
disk_total |
(Agent) Disk Storage Capacity |
Total space on the disks, including used and free Unit: GB
|
≥ 0 |
ECS - Mount point |
1 minute |
disk_used |
(Agent) Used Disk Space |
Used space on the disks Unit: GB
|
≥ 0 |
ECS - Mount point |
1 minute |
disk_usedPercent |
(Agent) Disk Usage |
Percentage of total disk space that is used, which is calculated as follows: Disk Usage = Used Disk Space/Disk Storage Capacity Unit: percent
|
0-100 |
ECS - Mount point |
1 minute |
OS Metric: Disk I/O
OS Metric: File System
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
disk_fs_rwstate |
(Agent) File System Read/Write Status |
Read and write status of the mounted file system of the monitored object Possible values are 0 (read and write) and 1 (read only). Linux: Check file system information in the fourth column in file /proc/mounts. |
|
ECS - Mount point |
1 |
disk_inodesTotal |
(Agent) Disk inode Total |
Total number of index nodes on the disk Linux: Run the df -i command to check the value in the Inodes column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). |
≥ 0 |
ECS - Mount point |
1 minute |
disk_inodesUsed |
(Agent) Total inode Used |
Number of used index nodes on the disk Linux: Run the df -i command to check the value in the IUsed column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). |
≥ 0 |
ECS - Mount point |
1 minute |
disk_inodesUsedPercent |
(Agent) Percentage of Total inode Used |
Number of used index nodes on the disk Unit: percent Linux: Run the df -i command to check the value in the IUse% column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). |
0-100 |
ECS - Mount point |
1 minute |
The Windows OS does not support the file system metrics.
OS Metric: NIC
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
net_bitRecv |
(Agent) Outbound Bandwidth |
Number of bits sent by this NIC per second Unit: bit/s
|
≥ 0 bit/s |
ECS |
1 minute |
net_bitSent |
(Agent) Inbound Bandwidth |
Number of bits received by this NIC per second Unit: bit/s
|
≥ 0 bit/s |
ECS |
1 minute |
net_packetRecv |
(Agent) NIC Packet Receive Rate |
Number of packets received by this NIC per second Unit: count/s
|
≥ 0 Counts/s |
ECS |
1 minute |
net_packetSent |
(Agent) NIC Packet Send Rate |
Number of packets sent by this NIC per second Unit: count/s
|
≥ 0 Counts/s |
ECS |
1 minute |
net_errin |
(Agent) Receive Error Rate |
Percentage of receive errors detected by this NIC per second Unit: percent
|
0-100 |
ECS |
1 minute |
net_errout |
(Agent) Transmit Error Rate |
Percentage of transmit errors detected by this NIC per second Unit: percent
|
0-100 |
ECS |
1 minute |
net_dropin |
(Agent) Received Packet Drop Rate |
Percentage of packets received by this NIC which were dropped per second Unit: percent
|
0-100 |
ECS |
1 minute |
net_dropout |
(Agent) Transmitted Packet Drop Rate |
Percentage of packets transmitted by this NIC which were dropped per second Unit: percent
|
0-100 |
ECS |
1 minute |
OS Metric: NTP
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
ntp_offset |
(Agent) NTP Offset |
NTP offset of the monitored object Unit: ms Collection method for Linux ECSs: Run chronyc sources -v to obtain the offset. |
≥ 0 ms |
ECS |
1 minute |
OS Metric: TCP
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
net_tcp_total |
(Agent) TCP TOTAL |
Total number of TCP connections in all states Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_established |
(Agent) TCP ESTABLISHED |
Number of TCP connections in ESTABLISHED state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_sys_sent |
(Agent) TCP SYS_SENT |
Number of TCP connections that are being requested by the client Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_sys_recv |
(Agent) TCP SYS_RECV |
Number of pending TCP connections received by the server Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_fin_wait1 |
(Agent) TCP FIN_WAIT1 |
Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_fin_wait2 |
(Agent) TCP FIN_WAIT2 |
Number of TCP connections in the FIN_WAIT2 state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_time_wait |
(Agent) TCP TIME_WAIT |
Number of TCP connections in TIME_WAIT state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_close |
(Agent) TCP CLOSE |
Number of closed TCP connections Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_close_wait |
(Agent) TCP CLOSE_WAIT |
Number of TCP connections in CLOSE_WAIT TCP state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_last_ack |
(Agent) TCP LAST_ACK |
Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_listen |
(Agent) TCP LISTEN |
Number of TCP connections in the LISTEN state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_closing |
(Agent) TCP CLOSING |
Number of TCP connections to be automatically closed by the server and the client at the same time Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_retrans |
(Agent) TCP Retransmission Rate |
Percentage of packets that are resent Unit: percent
|
0-100 |
ECS |
1 minute |
OS Metric: GPU
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
gpu_status |
GPU Health Status |
Overall measurement of the GPU health Unit: none
|
|
|
1 minute |
gpu_usage_encoder |
Encoding Usage |
Encoding capability usage on the GPU Unit: percent
|
0-100 |
|
1 minute |
gpu_usage_decoder |
Decoding Usage |
Decoding capability usage on the GPU Unit: percent
|
0-100 |
|
1 minute |
gpu_volatile_correctable |
Volatile Correctable ECC Errors |
Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset. Unit: count
|
≥ 0 |
|
1 minute |
gpu_volatile_uncorrectable |
Volatile Uncorrectable ECC Errors |
Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset. Unit: count
|
≥ 0 |
|
1 minute |
gpu_aggregate_correctable |
Aggregate Correctable ECC Errors |
Aggregate correctable ECC errors on the GPU Unit: count
|
≥ 0 |
|
1 minute |
gpu_aggregate_uncorrectable |
Aggregate Uncorrectable ECC Errors |
Aggregate uncorrectable ECC Errors on the GPU Unit: count
|
≥ 0 |
|
1 minute |
gpu_retired_page_single_bit |
Retired Page Single Bit Errors |
Number of retired page single bit errors, which indicates the number of single-bit pages blocked by the graphics card Unit: count
|
≥ 0 |
|
1 minute |
gpu_retired_page_double_bit |
Retired Page Double Bit Errors |
Number of retired page double bit errors, which indicates the number of double-bit pages blocked by the graphics card Unit: count
|
≥ 0 |
|
1 minute |
gpu_performance_state |
(Agent) Performance Status |
GPU performance of the monitored object Unit: none
|
P0-P15, P32
|
|
1 minute |
gpu_usage_mem |
(Agent) GPU Memory Usage |
GPU memory usage of the monitored object Unit: percent
|
0-100 |
|
1 minute |
gpu_usage_gpu |
(Agent) GPU Usage |
GPU usage of the monitored object Unit: percent
|
0-100 |
|
1 minute |
gpu_free_mem |
GPU Free Memory |
Free Memory on the GPU Unit: MB
|
≥ 0 MB |
|
1 minute |
gpu_graphics_clocks |
GPU Graphics Clocks |
Current Graphics Clocks on the GPU Unit: MHz
|
≥ 0 MHz |
|
1 minute |
gpu_mem_clocks |
GPU Memory Clocks |
Current Memory Clocks on the GPU Unit: MHz
|
≥ 0 MHz |
|
1 minute |
gpu_power_draw |
GPU Draw Power |
Draw Power on the GPU Unit: W
|
NA |
|
1 minute |
gpu_rx_throughput_pci |
GPU PCI Rx Throughput |
Current PCI Rx Throughput on the GPU Unit: MByte/s
|
≥ 0 MByte/s |
|
1 minute |
gpu_sm_clocks |
GPU SM Clocks |
Current SM Clocks on the GPU Unit: MHz
|
≥ 0 MHz |
|
1 minute |
gpu_temperature |
GPU Temperature |
Current Temperature on the GPU Unit: °C
|
≥ 0 °C |
|
1 minute |
gpu_tx_throughput_pci |
GPU PCI Tx Throughput |
Current PCI Tx Throughput on the GPU Unit: MByte/s
|
≥ 0 MByte/s |
|
1 minute |
gpu_used_mem |
GPU Used Memory |
Memory Used on the GPU Unit: MB
|
≥ 0 MB |
|
1 minute |
gpu_video_clocks |
GPU Video Clocks |
Current Video Clocks on the GPU Unit: MHz
|
≥ 0 MHz |
|
1 minute |
OS Metrics: NPU
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
npu_device_health |
NPU Device Health |
An overall measurement of the GPU health Unit: none Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
|
|
1 minute |
npu_util_rate_mem |
NPU Util Rate Mem |
The utilization rate of the NPU memory Unit: percent Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
0-100 |
|
1 minute |
npu_util_rate_ai_core |
NPU Util Rate AI Core |
The utilization rate of the NPU AI Core Unit: percent Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
0-100 |
|
1 minute |
npu_util_rate_ai_cpu |
NPU Util Rate AI Cpu |
The utilization rate of the NPU's AI CPU Unit: percent Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
0-100 |
|
1 minute |
npu_util_rate_ctrl_cpu |
NPU Util Rate Ctrl CPU |
The utilization rate of the NPU's Control CPU Unit: percent Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
0-100 |
|
1 minute |
npu_util_rate_mem_bandwidth |
NPU Util Rate Mem Bandwidth |
The utilization rate of the NPU memory bandwidth Unit: percent Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
0-100 |
|
1 minute |
npu_freq_mem |
NPU Freq Mem |
Current Frequency(Clock) of the NPU memory Unit: MHz Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
npu_freq_ai_core |
NPU Freq AI Core |
Current Frequency(Clock) of the NPU's AI Core Unit: MHz Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
npu_usage_mem |
NPU Usage Mem |
Current used NPU memory Unit: MB Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
npu_sbe |
NPU SBE |
Numbers of single bit error of the NPU Unit: count Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
npu_dbe |
NPU DBE |
Numbers of double bit error of the NPU Unit: count Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
npu_power |
NPU Power |
The power of the NPU (current power for 310P, rated power for 310) Unit: W Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
npu_temperature |
NPU temperature |
Current temperature of the GPU Unit: °C Linux: Obtain the metric value from the libdcmi.so library file of the NPU card. |
≥ 0 |
|
1 minute |
The Windows OS does not support NPU metrics.
OS Metrics: DAVP
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
davp_device_health |
DAVP Device Health |
An overall measurement of the DAVP health Unit: none Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
|
|
1 minute |
davp_util_rate_mem |
DAVP Util Rate Mem |
The utilization rate of the davp memory Unit: percent Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
0-100 |
|
1 minute |
davp_usage_mem |
DAVP Usage Mem |
Current used davp memory Unit: MB Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
≥ 0 |
|
1 minute |
davp_util_rate_ai_core |
DAVP Util Rate AI Core |
The utilization rate of the DAVP AI Core Unit: percent Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
0-100 |
|
1 minute |
davp_util_rate_vdsp_core |
DAVP Util Rate Vdsp Core |
The utilization rate of the DAVP Vdsp Core Unit: percent Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
0-100 |
|
1 minute |
davp_util_rate_enc_core |
DAVP Util Rate Enc Core |
The utilization rate of the DAVP Enc Core Unit: percent Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
0-100 |
|
1 minute |
davp_util_rate_dec_core |
DAVP Util Rate Dec Core |
The utilization rate of the DAVP Dec Core Unit: percent Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
0-100 |
|
1 minute |
davp_sysc_temperature |
Davp System Module Temperature |
Current system module temperature of davp Unit: °C Linux: Obtain the metric value from the libdcmi.so library file in the VAtools tool of the DAVP card. |
≥ 0 |
|
1 minute |
The Windows OS does not support DAVP metrics.
Dimensions
Dimension |
Key |
Value |
---|---|---|
ECS |
instance_id |
Specifies the ECS ID. |
ECS - Disk |
disk |
Specifies the disks attached to an ECS. |
ECS - Mount point |
mount_point |
Specifies the mount point of a disk. |
ECS - GPU |
gpu |
Specifies the graphics card of an ECS. |
ECS - NPU |
npu |
Specifies the NPU graphics card of an NPU-based ECS. |
ECS - DAVP |
davp |
Specifies the DaoCloud DAVP1 video acceleration card of a DAVP-based ECS. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot