OS Monitoring Metrics Supported by ECSs with the Agent Installed
Description
OS monitoring provides system-level, proactive, and fine-grained monitoring. It requires the Agent to be installed on the ECSs to be monitored. This section describes OS monitoring metrics reported to Cloud Eye.
OS monitoring supports metrics about CPU, CPU load, memory, disk, disk I/O, file system, GPU, NIC, NTP, and TCP.
After the Agent is installed, you can view monitoring metrics of ECSs running different OSs. Monitoring data is collected every 1 minute.
Namespace
AGT.ECS
OS Metrics: CPU
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
cpu_usage |
(Agent) CPU Usage |
CPU usage of the monitored object Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_idle |
(Agent) Idle CPU Usage |
Percentage of time that CPU is idle Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_user |
(Agent) User Space CPU Usage |
Percentage of time that the CPU is used by user space Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_system |
(Agent) Kernel Space CPU Usage |
Percentage of time that the CPU is used by kernel space Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_other |
(Agent) Other Process CPU Usage |
Percentage of time that the CPU is used by other processes Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_nice |
(Agent) Nice Process CPU Usage |
Percentage of time that the CPU is in user mode with low-priority processes which can easily be interrupted by higher-priority processes Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_iowait |
(Agent) iowait Process CPU Usage |
Percentage of time that the CPU is waiting for I/O operations to complete Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_irq |
(Agent) CPU Interrupt Time |
Percentage of time that the CPU is servicing interrupts Unit: percent
|
0-100 |
ECS |
1 minute |
cpu_usage_softirq |
(Agent) CPU Software Interrupt Time |
Percentage of time that the CPU is servicing software interrupts Unit: percent
|
0-100 |
ECS |
1 minute |
OS Metric: CPU Load
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
load_average1 |
(Agent) 1-Minute Load Average |
CPU load averaged from the last 1 minute Linux: Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. Run the top command to check the load1 value. |
≥ 0 |
ECS |
1 minute |
load_average5 |
(Agent) 5-Minute Load Average |
CPU load averaged from the last 5 minutes Linux: Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. Run the top command to check the load5 value. |
≥ 0 |
ECS |
1 minute |
load_average15 |
(Agent) 15-Minute Load Average |
CPU load averaged from the last 15 minutes Linux: Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. Run the top command to check the load15 value. |
≥ 0 |
ECS |
1 minute |
OS Metric: Memory
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
mem_available |
(Agent) Available Memory |
Amount of memory that is available and can be given instantly to processes Unit: GB
|
≥ 0 |
ECS |
1 minute |
mem_usedPercent |
(Agent) Memory Usage |
Memory usage of the monitored object Unit: percent
|
0-100 |
ECS |
1 minute |
mem_free |
(Agent) Idle Memory |
Amount of memory that is not being used Unit: GB
|
≥ 0 |
ECS |
1 minute |
mem_buffers |
(Agent) Buffer |
Amount of memory that is being used for buffers Unit: GB
|
≥ 0 |
ECS |
1 minute |
mem_cached |
(Agent) Cache |
Amount of memory that is being used for file caches Unit: GB
|
≥ 0 |
ECS |
1 minute |
total_open_files |
(Agent) Total File Handles |
Total handles used by all processes Unit: count
|
≥0 |
ECS |
1 minute |
OS Metric: Disk
- Currently,only physical disks are monitored. The NFS-attached disks cannot be monitored.
- By default, Docker-related mount points are shielded. The prefix of the mount point is as follows:
/var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
disk_free |
(Agent) Available Disk Space |
Free space on the disks Unit: GB
|
≥ 0 |
ECS - Mount point |
1 minute |
disk_total |
(Agent) Disk Storage Capacity |
Total space on the disks, including used and free Unit: GB
|
≥ 0 |
ECS - Mount point |
1 minute |
disk_used |
(Agent) Used Disk Space |
Used space on the disks Unit: GB
|
≥ 0 |
ECS - Mount point |
1 minute |
disk_usedPercent |
(Agent) Disk Usage |
Percentage of total disk space that is used, which is calculated as follows: Disk Usage = Used Disk Space/Disk Storage Capacity Unit: percent
|
0-100 |
ECS - Mount point |
1 minute |
OS Metric: Disk I/O
OS Metric: File System
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
disk_fs_rwstate |
(Agent) File System Read/Write Status |
Read and write status of the mounted file system of the monitored object Possible values are 0 (read and write) and 1 (read only). Linux: Check file system information in the fourth column in file /proc/mounts. |
|
ECS - Mount point |
1 |
disk_inodesTotal |
(Agent) Disk inode Total |
Total number of index nodes on the disk Linux: Run the df -i command to check the value in the Inodes column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~). |
≥ 0 |
ECS - Mount point |
1 minute |
disk_inodesUsed |
(Agent) Total inode Used |
Number of used index nodes on the disk Linux: Run the df -i command to check the value in the IUsed column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~). |
≥ 0 |
ECS - Mount point |
1 minute |
disk_inodesUsedPercent |
(Agent) Percentage of Total inode Used |
Number of used index nodes on the disk Unit: percent Linux: Run the df -i command to check the value in the IUse% column. The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~). |
0-100 |
ECS - Mount point |
1 minute |
The Windows OS does not support the file system metrics.
OS Metric: NIC
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
net_bitRecv |
(Agent) Outbound Bandwidth |
Number of bits sent by this NIC per second Unit: bit/s
|
≥ 0 bit/s |
ECS |
1 minute |
net_bitSent |
(Agent) Inbound Bandwidth |
Number of bits received by this NIC per second Unit: bit/s
|
≥ 0 bit/s |
ECS |
1 minute |
net_packetRecv |
(Agent) NIC Packet Receive Rate |
Number of packets received by this NIC per second Unit: count/s
|
≥ 0 Counts/s |
ECS |
1 minute |
net_packetSent |
(Agent) NIC Packet Send Rate |
Number of packets sent by this NIC per second Unit: count/s
|
≥ 0 Counts/s |
ECS |
1 minute |
net_errin |
(Agent) Receive Error Rate |
Percentage of receive errors detected by this NIC per second Unit: percent
|
0-100 |
ECS |
1 minute |
net_errout |
(Agent) Transmit Error Rate |
Percentage of transmit errors detected by this NIC per second Unit: percent
|
0-100 |
ECS |
1 minute |
net_dropin |
(Agent) Received Packet Drop Rate |
Percentage of packets received by this NIC which were dropped per second Unit: percent
|
0-100 |
ECS |
1 minute |
net_dropout |
(Agent) Transmitted Packet Drop Rate |
Percentage of packets transmitted by this NIC which were dropped per second Unit: percent
|
0-100 |
ECS |
1 minute |
OS Metric: NTP
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
ntp_offset |
(Agent) NTP Offset |
NTP offset of the monitored object Unit: ms Collection method for Linux ECSs: Run chronyc sources -v to obtain the offset. |
≥ 0 ms |
ECS |
1 minute |
OS Metric: TCP
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
net_tcp_total |
(Agent) TCP TOTAL |
Total number of TCP connections in all states Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_established |
(Agent) TCP ESTABLISHED |
Number of TCP connections in ESTABLISHED state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_sys_sent |
(Agent) TCP SYS_SENT |
Number of TCP connections that are being requested by the client Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_sys_recv |
(Agent) TCP SYS_RECV |
Number of pending TCP connections received by the server Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_fin_wait1 |
(Agent) TCP FIN_WAIT1 |
Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_fin_wait2 |
(Agent) TCP FIN_WAIT2 |
Number of TCP connections in the FIN_WAIT2 state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_time_wait |
(Agent) TCP TIME_WAIT |
Number of TCP connections in TIME_WAIT state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_close |
(Agent) TCP CLOSE |
Number of closed TCP connections Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_close_wait |
(Agent) TCP CLOSE_WAIT |
Number of TCP connections in CLOSE_WAIT TCP state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_last_ack |
(Agent) TCP LAST_ACK |
Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_listen |
(Agent) TCP LISTEN |
Number of TCP connections in the LISTEN state Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_closing |
(Agent) TCP CLOSING |
Number of TCP connections to be automatically closed by the server and the client at the same time Unit: count
|
≥ 0 |
ECS |
1 minute |
net_tcp_retrans |
(Agent) TCP Retransmission Rate |
Percentage of packets that are resent Unit: percent
|
0-100% |
ECS |
1 minute |
OS Metric: GPU
Metric |
Parameter |
Description |
Value Range |
Monitored Object & Dimension |
Monitoring Period (Raw Data) |
---|---|---|---|---|---|
gpu_status |
GPU Health Status |
Overall measurement of the GPU health Unit: none
|
|
|
1 minute |
gpu_usage_encoder |
Encoding Usage |
Encoding capability usage on the GPU Unit: percent
|
0-100% |
|
1 minute |
gpu_usage_decoder |
Decoding Usage |
Decoding capability usage on the GPU Unit: percent
|
0-100% |
|
1 minute |
gpu_volatile_correctable |
Volatile Correctable ECC Errors |
Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset. Unit: count
|
≥ 0 |
|
1 minute |
gpu_volatile_uncorrectable |
Volatile Uncorrectable ECC Errors |
Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset. Unit: count
|
≥ 0 |
|
1 minute |
gpu_aggregate_correctable |
Aggregate Correctable ECC Errors |
Aggregate correctable ECC errors on the GPU Unit: count
|
≥ 0 |
|
1 minute |
gpu_aggregate_uncorrectable |
Aggregate Uncorrectable ECC Errors |
Aggregate uncorrectable ECC Errors on the GPU Unit: count
|
≥ 0 |
|
1 minute |
gpu_retired_page_single_bit |
Retired Page Single Bit Errors |
Number of retired page single bit errors, which indicates the number of single-bit pages blocked by the graphics card Unit: count
|
≥ 0 |
|
1 minute |
gpu_retired_page_double_bit |
Retired Page Double Bit Errors |
Number of retired page double bit errors, which indicates the number of double-bit pages blocked by the graphics card Unit: count
|
≥ 0 |
|
1 minute |
gpu_performance_state |
(Agent) Performance Status |
GPU performance of the monitored object Unit: none
|
P0-P15, P32
|
ECS - GPU |
1 minute |
gpu_usage_mem |
(Agent) GPU Memory Usage |
GPU memory usage of the monitored object Unit: percent
|
0-100 |
ECS - GPU |
1 minute |
gpu_usage_gpu |
(Agent) GPU Usage |
GPU usage of the monitored object Unit: percent
|
0-100 |
ECS - GPU |
1 minute |
Dimensions
Dimension |
Key |
Value |
---|---|---|
ECS |
instance_id |
Specifies the ECS ID. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.