Help Center/ Bare Metal Server/ User Guide/ Cloud Eye Monitoring/ Monitored Metrics (with Agent Installed)

Updated on 2025-11-21 GMT+08:00

Monitored Metrics (with Agent Installed)

Description

This section describes monitoring metrics reported by BMS to Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for BMS.

Cloud Eye can monitor dimensions nested to a maximum depth of four levels (levels 0 to 3). 3 is the deepest level. For example, if the monitored dimension of a metric is instance_id,mount_point, instance_id indicates level 0 and mount_point indicates level 1.

Prerequisites

The Agent has been installed. For details, see Installing the Agent.

Namespace

SERVICE.BMS

OS Metrics: CPU

**Table 1** CPU metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
cpu_usage	(Agent) CPU Usage	CPU usage of the monitored object Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) value. Windows: Obtain the metric value using the Windows API GetSystemTimes.	0-100	%	N/A	instance_id	1 minute
cpu_usage_idle	(Agent) Idle CPU Usage	Percentage of time that CPU is idle Linux: Check metric value changes in file /proc/stat in a collection period. Windows: Obtain the metric value using the Windows API GetSystemTimes.	0-100	%	N/A	instance_id	1 minute
cpu_usage_other	(Agent) Other Process CPU Usage	Percentage of time that the CPU is used by other processes Linux: Other Process CPU Usage = 1 – Idle CPU Usage (%) – Kernel Space CPU Usage (%) – User Space CPU Usage (%) Windows: Other Process CPU Usage = 1 – Idle CPU Usage (%) – Kernel Space CPU Usage (%) – User Space CPU Usage (%)	0-100	%	N/A	instance_id	1 minute
cpu_usage_system	(Agent) Kernel Space CPU Usage	Percentage of time that the CPU is used by kernel space Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) sy value. Windows: Obtain the metric value using the Windows API GetSystemTimes.	0-100	%	N/A	instance_id	1 minute
cpu_usage_user	(Agent) User Space CPU Usage	Percentage of time that the CPU is used by user space Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) us value. Windows: Obtain the metric value using the Windows API GetSystemTimes.	0-100	%	N/A	instance_id	1 minute
cpu_usage_nice	(Agent) Nice Process CPU Usage	Percentage of time that the CPU is used by the Nice process Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) ni value. Windows is not supported currently.	0-100	%	N/A	instance_id	1 minute
cpu_usage_iowait	(Agent) iowait Process CPU Usage	Percentage of time during which the CPU is waiting for I/O operations to complete Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) wa value. Windows is not supported currently.	0-100	%	N/A	instance_id	1 minute
cpu_usage_irq	(Agent) CPU Interrupt Time	Percentage of time that the CPU is servicing interrupts Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) hi value. Windows is not supported currently.	0-100	%	N/A	instance_id	1 minute
cpu_usage_softirq	(Agent) CPU Software Interrupt Time	Percentage of time that the CPU is servicing software interrupts Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) si value. Windows is not supported currently.	0-100	%	N/A	instance_id	1 minute

OS Metrics: CPU Load

**Table 2** CPU load metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
load_average1	(Agent) 1-Minute Load Average	CPU load averaged from the last 1 minute Linux: Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. Run the top command and check the load1 value.	≥0	N/A	N/A	instance_id	1 minute
load_average5	(Agent) 5-Minute Load Average	CPU load averaged from the last 5 minutes Linux: Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. Run the top command and check the load5 value.	≥0	N/A	N/A	instance_id	1 minute
load_average15	(Agent) 15-Minute Load Average	CPU load averaged from the last 15 minutes Linux: Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. Run the top command and check the load15 value.	≥0	N/A	N/A	instance_id	1 minute

OS Metrics: Memory

**Table 3** Memory metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
mem_available	(Agent) Available Memory	Available memory of the monitored object Linux: Obtain the metric value from /proc/meminfo. If MemAvailable is displayed in /proc/meminfo, obtain the value. If MemAvailable is not displayed in /proc/meminfo, calculate the value with the formula MemAvailable = MemFree + Buffers + Cached. Windows: Available memory = Total memory – Used memory. Obtain the metric value using the Windows API GlobalMemoryStatusEx.	≥0	GB	N/A	instance_id	1 minute
mem_usedPercent	(Agent) Memory Usage	Memory usage of the monitored object Linux: Obtain the metric value from the /proc/meminfo file. Memory Usage = (MemTotal – MemAvailable)/MemTotal If MemAvailable is displayed in /proc/meminfo, calculate the value with the formula MemUsedPercent = (MemTotal – MemAvailable)/MemTotal. If MemAvailable is not displayed in /proc/meminfo, calculate the value with the formula MemUsedPercent = (MemTotal – MemFree – Buffers – Cached)/MemTotal. Windows: Memory Usage = Used memory/Total memory x 100%	0-100	%	N/A	instance_id	1 minute
mem_free	(Agent) Idle Memory	Memory that is not being used Linux: Obtain the metric value from /proc/meminfo. Windows is not supported currently.	≥0	GB	N/A	instance_id	1 minute
mem_buffers	(Agent) Buffer	Memory that is being used for buffers Linux: Obtain the metric value from /proc/meminfo. Run the top command and check the KiB Mem:buffers value. Windows is not supported currently.	≥0	GB	N/A	instance_id	1 minute
mem_cached	(Agent) Cache	Memory that is being used for caches Linux: Obtain the metric value from /proc/meminfo. Run the top command and check the KiB Swap:cached Mem value. Windows is not supported currently.	≥0	GB	N/A	instance_id	1 minute
total_open_files	(Agent) Total File Handles	Total handles used by all processes Linux: Use the /proc/{pid}/fd file to summarize the handles used by all processes. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute

OS Metrics: Disk

Currently, Cloud Eye Agent only monitors physical disks. NFS-mounted disks cannot be monitored.
By default, Cloud Eye Agent excludes Docker-related mount points. The mount point prefixes are as follows:
```
/var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos
```

**Table 4** Disk metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
disk_free	(Agent) Available Disk Space	Available disk space of the monitored object Linux: Run the df -h command and check the value in the Avail column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).	≥0	GB	N/A	instance_id,mount_point	1 minute
disk_total	(Agent) Disk Storage Capacity	Total disk capacity of the monitored object Linux: Run the df -h command and check the value in the Size column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).	≥0	GB	N/A	instance_id,mount_point	1 minute
disk_used	(Agent) Used Disk Space	Used disk space of the monitored object Linux: Run the df -h command and check the value in the Used column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).	≥0	GB	N/A	instance_id,mount_point	1 minute
disk_usedPercent	(Agent) Disk Usage	Disk usage of the monitored object Formula: Disk Usage = Used Disk Space/Disk Storage Capacity Linux: Obtain the metric value using the following formula: Disk Usage = Used Disk Space/Disk Storage Capacity. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).	0-100	%	N/A	instance_id,mount_point	1 minute

OS Metrics: Disk I/O

**Table 5** Disk I/O metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
disk_agt_read_bytes_rate	(Agent) Disks Read Rate	Number of bytes read from the monitored disk per second Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the sixth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). High CPU usage may lead to timeouts in monitoring data collection.	≥ 0	byte/s	1024(IEC)	instance_id,mount_point instance_id,disk	1 minute
disk_agt_read_requests_rate	(Agent) Disks Read Requests	Number of requests to read data from the monitored disk per second Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the fourth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). High CPU usage may lead to timeouts in monitoring data collection.	≥ 0	request/s	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_agt_write_bytes_rate	(Agent) Disks Write Rate	Number of bytes written into the monitored disk per second Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the tenth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). High CPU usage may lead to timeouts in monitoring data collection.	≥ 0	byte/s	1024(IEC)	instance_id,mount_point instance_id,disk	1 minute
disk_agt_write_requests_rate	(Agent) Disks Write Requests	Number of requests to write data into the monitored disk per second Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the eighth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows: Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). High CPU usage may lead to timeouts in monitoring data collection.	≥ 0	request/s	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_readTime	(Agent) Average Read Request Time	Average amount of time that read requests have waited on the disks Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the seventh column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	ms/count	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_writeTime	(Agent) Average Write Request Time	Average amount of time that write requests have waited on the disks Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the eleventh column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	ms/count	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_ioUtils	(Agent) Disk I/O Usage	Disk I/O usage of the monitored object Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the thirteenth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	0-100	%	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_queue_length	(Agent) Disk Queue Length	Average number of read or write requests waiting to be processed by the monitored disk in a monitoring period Linux: In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the fourteenth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	Count	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_write_bytes_per_operation	(Agent) Average Disk Write Size	Average number of bytes written into the monitored disk per write I/O in a monitoring period Linux: In file /proc/diskstats, locate the device and calculate the metric value by dividing data changes in the tenth column by that in the eighth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	Byte/op	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_read_bytes_per_operation	(Agent) Average Disk Read Size	Average number of bytes read from the monitored disk per read I/O in a monitoring period Linux: In file /proc/diskstats, locate the device and calculate the metric value by dividing data changes in the sixth column by that in the fourth column in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	Byte/op	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_io_svctm	(Agent) Disk I/O Service Time	Average time the monitored disk takes to complete an I/O request (read or write) in a monitoring period Linux: In file /proc/diskstats, locate the device and calculate the metric value by dividing the data changes in the thirteenth column by the sum of data changes in the fourth and eighth columns in a collection period. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	ms/op	N/A	instance_id,mount_point instance_id,disk	1 minute
disk_device_used_percent	(Agent) Block Device Usage	Physical disk usage of the monitored object Formula: Block device usage = Storage space used by all mounted disk partitions/Total disk storage space Linux: Calculate the sum of storage space used by all mount points. Calculate the total disk storage space based on the disk sector size and the number of sectors. Then, calculate the block device usage based on the formula mentioned above. Windows is not supported currently.	0-100	%	N/A	instance_id,mount_point instance_id,disk	1 minute

OS Metrics: File System

**Table 6** File system metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
disk_fs_rwstate	(Agent) File System Read/Write Status	File system read/write status of the monitored object Possible values are 0 (read and write) and 1 (read-only). Linux: Obtain the file system status from the fourth column in the /proc/mounts file. Windows is not supported currently.	0: read and write 1: read-only	N/A	N/A	instance_id,mount_point	1 minute
disk_inodesTotal	(Agent) Disk inode Total	Total number of index nodes on the disk Linux: Run the df -i command and check the value in the Inodes column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	N/A	N/A	instance_id,mount_point	1 minute
disk_inodesUsed	(Agent) Total inode Used	Number of used index nodes on the disk Linux: Run the df -i command and check the value in the IUsed column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	≥ 0	N/A	N/A	instance_id,mount_point	1 minute
disk_inodesUsedPercent	(Agent) Percentage of Total inode Used	Percentage of used index nodes on the disk Linux: Run the df -i command and check the value in the IUse% column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~). Windows is not supported currently.	0-100	%	N/A	instance_id,mount_point	1 minute

OS Metrics: TCP

**Table 7** TCP metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
net_tcp_total	(Agent) Total TCP Connections	Total number of TCP connections in all states Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_established	(Agent) TCP ESTABLISHED Connections	Number of TCP connections in ESTABLISHED state Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_sys_sent	(Agent) TCP SYS_SENT Connections	Number of TCP connections that are being requested by the client Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_sys_recv	(Agent) TCP SYS_RECV Connections	Number of pending TCP connections received by the server Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_fin_wait1	(Agent) TCP FIN_WAIT1 Connections	Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_fin_wait2	(Agent) TCP FIN_WAIT2 Connections	Number of TCP connections in FIN_WAIT2 state Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_time_wait	(Agent) TCP TIME_WAIT Connections	Number of TCP connections in TIME_WAIT state Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_close	(Agent) TCP CLOSE Connections	Number of closed TCP connections Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_close_wait	(Agent) TCP CLOSE_WAIT Connections	Number of TCP connections in CLOSE_WAIT TCP state Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_last_ack	(Agent) TCP LAST_ACK Connections	Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_listen	(Agent) TCP LISTEN Connections	Number of TCP connections in LISTEN state Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_closing	(Agent) TCP CLOSING Connections	Number of TCP connections to be actively closed by the server and the client at the same time Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state. Windows: Obtain the metric value using the Windows API GetTcpTable2.	≥ 0	Count	N/A	instance_id	1 minute
net_tcp_retrans	(Agent) TCP Retransmission Rate	Percentage of packets that are resent Linux: Obtain the metric value from the /proc/net/snmp file. The value is the ratio of the number of retransmitted packets to the number of total packets sent in a collection period. Windows: Obtain the metric value using the Windows API GetTcpStatistics.	0-100	%	N/A	instance_id	1 minute

OS Metrics: NIC

**Table 8** NIC metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
net_bitRecv	(Agent) Outbound Bandwidth	Number of bits sent by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows: Obtain the metric value using the WMI object MibIfRow.	≥ 0	bit/s	1024(IEC)	instance_id instance_id,network_interface_card	1 minute
net_bitSent	(Agent) Inbound Bandwidth	Number of bits received by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows: Obtain the metric value using the WMI object MibIfRow.	≥ 0	bit/s	1024(IEC)	instance_id instance_id,network_interface_card	1 minute
net_packetRecv	(Agent) NIC Packet Receive Rate	Number of packets received by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows: Obtain the metric value using the WMI object MibIfRow.	≥ 0	Count/s	N/A	instance_id instance_id,network_interface_card	1 minute
net_packetSent	(Agent) NIC Packet Send Rate	Number of packets sent by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows: Obtain the metric value using the WMI object MibIfRow.	≥ 0	Count/s	N/A	instance_id instance_id,network_interface_card	1 minute
net_errin	(Agent) Receive Error Rate	Percentage of error packets relative to the total packets received by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows is not supported currently.	0-100	%	N/A	instance_id instance_id,network_interface_card	1 minute
net_errout	(Agent) Transmit Error Rate	Percentage of error packets relative to the total packets sent by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows is not supported currently.	0-100	%	N/A	instance_id instance_id,network_interface_card	1 minute
net_dropin	(Agent) Received Packet Drop Rate	Percentage of received but dropped packets relative to the total packets received by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows is not supported currently.	0-100	%	N/A	instance_id instance_id,network_interface_card	1 minute
net_dropout	(Agent) Transmitted Packet Drop Rate	Percentage of sent but dropped packets relative to the total packets sent by the monitored object per second Linux: Check metric value changes in file /proc/net/dev in a collection period. Windows is not supported currently.	0-100	%	N/A	instance_id instance_id,network_interface_card	1 minute

Process Monitoring Metrics

**Table 9** Process Metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
proc_pHashId_cpu	(Agent) CPU Usage	CPU consumed by a process. pHashId is the MD5 value of the process name plus process ID. Linux: Check metric value changes in file /proc/pid/stat. Windows: Obtain the metric value using the Windows API GetProcessTimes.	0–1 x Number of CPU cores	%	N/A	instance_id	1 minute
proc_pHashId_mem	(Agent) Memory Usage	Memory consumed by a process. pHashId is the MD5 value of the process name plus process ID. Linux: RSSPAGESIZE/MemTotal Obtain the RSS* value by checking the second column of the file /proc/pid/statm. Obtain the PAGESIZE value by running the getconf PAGESIZE command. Obtain the MemTotal value by checking the file /proc/meminfo. Windows: Call the Windows API procGlobalMemoryStatusEx to obtain the total memory size. Call GetProcessMemoryInfo to obtain the used memory size. Divide the total size by the used size to get the memory usage.	0-100	%	N/A	instance_id	1 minute
proc_pHashId_file	(Agent) Opened Files	Number of files opened by a process. pHashId is the MD5 value of the process name plus process ID. Linux: Run the ls -l /proc/pid/fd command to check the number of opened files. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute
proc_running_count	(Agent) Running Processes	Number of running processes of the monitored object Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute
proc_idle_count	(Agent) Idle Processes	Number of idle processes of the monitored object Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute
proc_zombie_count	(Agent) Zombie Processes	Number of zombie processes of the monitored object Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute
proc_blocked_count	(Agent) Blocked Processes	Number of blocked processes of the monitored object Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute
proc_sleeping_count	(Agent) Sleeping Processes	Number of sleeping processes of the monitored object Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows is not supported currently.	≥0	Count	N/A	instance_id	1 minute
proc_total_count	(Agent) Total Processes	Total number of processes of the monitored object Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows: Obtain the metric value using psapi.dll, the Windows process status API library.	≥0	Count	N/A	instance_id	1 minute
proc_specified_count	(Agent) Specified Processes	Number of specified processes Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state. Windows: Obtain the metric value using psapi.dll, the Windows process status API library.	≥0	N/A	N/A	instance_id,proc	1 minute

OS Metrics: GPU

If a server has eight GPUs and the PM mode is disabled, data may fail to be collected. You can enable the PM mode and restart the monitoring process to fix it.

**Table 10** GPU metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
gpu_status	(Agent) GPU Health Status	GPU health status. It is a composite metric. Possible causes: 1. The ECC exceeds the threshold. 2. The GPU memory address failed to be remapped. 3. GPU shows rev ff error. 4. infoROM error occurs. 5. There are pages to be isolated. 6. remapped rows error occurs. For details, see the detailed metrics below. Linux: Obtain the metric value by calling the API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the API provided by the GPU driver library nvml.dll.	0: healthy 1: subhealthy 2: faulty	N/A	N/A	instance_id,gpu	1 minute
gpu_performance_state	(Agent) Performance Status	GPU performance status Linux: Obtain the metric value by calling the NvmlDeviceGetPerformanceState API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetPerformanceState API provided by the GPU driver library nvml.dll.	P0–P15, P32 P0: the maximum performance P15: the minimum performance P32: unknown performance status	N/A	N/A	instance_id,gpu	1 minute
gpu_power_draw	(Agent) GPU Draw Power	Draw power on the GPU. If the power exceeds the maximum or is an incorrect value, the GPU hardware may be faulty. Linux: Obtain the metric value by calling the NvmlDeviceGetPowerUsage API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetPowerUsage API provided by the GPU driver library nvml.dll.	≥ 0	W	N/A	instance_id,gpu	1 minute
gpu_temperature	(Agent) GPU Temperature	Temperature of the GPU. If the temperature exceeds the threshold or is an incorrect value, the GPU hardware may be faulty. Linux: Obtain the metric value by calling the NvmlDeviceGetTemperature API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetTemperature API provided by the GPU driver library nvml.dll.	≥ 0	°C	N/A	instance_id,gpu	1 minute
gpu_usage_gpu	(Agent) GPU Usage	GPU compute usage. It is an instantaneous value at a sampling point. Linux: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library nvml.dll.	0-100	%	N/A	instance_id,gpu	1 minute
gpu_usage_mem	(Agent) GPU Memory Usage	GPU memory usage. It is an instantaneous value at a sampling point. Linux: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library nvml.dll.	0-100	%	N/A	instance_id,gpu instance_id,gpu_slot,pid_for_gpu	1 minute
gpu_used_mem	(Agent) GPU Used Memory	Memory used on the GPU Linux: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library nvml.dll.	≥ 0	MB	N/A	instance_id,gpu instance_id,gpu_slot,pid_for_gpu	1 minute
gpu_free_mem	(Agent) Remaining GPU Memory	Idle GPU memory Linux: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library nvml.dll.	≥ 0	MB	N/A	instance_id,gpu	1 minute
gpu_usage_encoder	(Agent) Encoding Usage	Encoder usage of the GPU. It is an instantaneous value at a sampling point. Linux: Obtain the metric value by calling the NvmlDeviceGetEncoderUtilization API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetEncoderUtilization API provided by the GPU driver library nvml.dll.	0-100	%	N/A	instance_id,gpu instance_id,gpu_slot,pid_for_gpu	1 minute
gpu_usage_decoder	(Agent) Decoding Usage	Decoder usage of the GPU. It is an instantaneous value at a sampling point. Linux: Obtain the metric value by calling the NvmlDeviceGetDecoderUtilization API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetDecoderUtilization API provided by the GPU driver library nvml.dll.	0-100	%	N/A	instance_id,gpu instance_id,gpu_slot,pid_for_gpu	1 minute
gpu_graphics_clocks	(Agent) GPU Graphics Clocks	GPU graphics (shader) clock frequency. The value is the GPU clock frequency related to graphics performance. If graphics capabilities are used, you can ignore this metric. Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.	≥ 0	MHz	N/A	instance_id,gpu	1 minute
gpu_sm_clocks	(Agent) GPU SM Clocks	SM clocks on the GPU. The value is the clock frequency for controlling the GPU memory speed. Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.	≥ 0	MHz	N/A	instance_id,gpu	1 minute
gpu_mem_clock	(Agent) GPU Memory Clocks	Memory clocks on the GPU. The value is the clock frequency closely related to CUDA core computing of the GPU. Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.	≥ 0	MHz	N/A	instance_id,gpu	1 minute
gpu_video_clocks	(Agent) GPU Video Clocks	Video clocks on the GPU. The value is the codec clock frequency of the GPU. Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.	≥ 0	MHz	N/A	instance_id,gpu	1 minute
gpu_tx_throughput_pci	(Agent) GPU PCI Tx Throughput	PCI Tx throughput on the GPU. The value is the amount of data sent by the GPU to the host via PCIe. Linux: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library nvml.dll.	≥ 0	MByte/s	N/A	instance_id,gpu	1 minute
gpu_rx_throughput_pci	(Agent) GPU PCI Rx Throughput	PCI Rx throughput on the GPU. The value is the amount of data sent by the host to the GPU via PCIe. Linux: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library nvml.dll.	≥ 0	MByte/s	N/A	instance_id,gpu	1 minute
gpu_volatile_correctable	(Agent) Volatile Correctable ECC Errors	Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset. Linux: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library nvml.dll.	≥ 0	Count	N/A	instance_id,gpu	1 minute
gpu_volatile_uncorrectable	(Agent) Volatile Uncorrectable ECC Errors	Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset. Linux: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library nvml.dll.	≥ 0	Count	N/A	instance_id,gpu	1 minute
gpu_aggregate_correctable	(Agent) Aggregate Correctable ECC Errors	Aggregate correctable ECC errors on the GPU Linux: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library nvml.dll.	≥ 0	Count	N/A	instance_id,gpu	1 minute
gpu_aggregate_uncorrectable	(Agent) Aggregate Uncorrectable ECC Errors	Aggregate uncorrectable ECC errors on the GPU Linux: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library nvml.dll.	≥ 0	Count	N/A	instance_id,gpu	1 minute
gpu_retired_page_single_bit	(Agent) Retired Page Single Bit Errors	Number of retired page single bit errors, which indicates the number of single-bit error pages blocked by the GPU Linux: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library nvml.dll.	≥ 0	Count	N/A	instance_id,gpu	1 minute
gpu_retired_page_double_bit	(Agent) Retired Page Double Bit Errors	Number of retired page double bit errors, which indicates the number of double-bit error pages blocked by the GPU Linux: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library libnvidia-ml.so.1. Windows: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library nvml.dll.	≥ 0	Count	N/A	instance_id,gpu	1 minute
gpu_lnkcap_speed	(Agent) Max. GPU Link Speed	Maximum PCIe link speed of the GPU, which means the maximum data throughput of the GPU on the PCIe bus Linux: Obtain the metric value by running lspci -d 10de: -vv \| grep -i lnkcap. Windows: Obtain the metric value by running (gwmi Win32_Bus -Filter 'DeviceID like "PCI%"').GetRelated('Win32_PnPEntity').	≥ 0	GT/s	N/A	instance_id,gpu	1 minute
gpu_lnkcap_width	(Agent) Max. GPU Link Width	Maximum PCIe link width of the GPU, which means the maximum number of PCIe lanes supported by the GPU Linux: Obtain the metric value by running lspci -d 10de: -vv \| grep -i lnkcap. Windows: Obtain the metric value by running (gwmi Win32_Bus -Filter 'DeviceID like "PCI%"').GetRelated('Win32_PnPEntity').	≥ 0	count	N/A	instance_id,gpu	1 minute
gpu_lnksta_speed	(Agent) GPU Link Speed	PCIe link speed of the GPU Linux: Obtain the metric value by running lspci -d 10de: -vv \| grep -i lnksta. Windows is not supported currently.	≥ 0	GT/s	N/A	instance_id,gpu	1 minute
gpu_lnksta_width	(Agent) GPU Link Width	PCIe link width of the GPU, which means the number of PCIe lanes of the GPU Linux: Obtain the metric value by running lspci -d 10de: -vv \| grep -i lnksta. Windows is not supported currently.	≥ 0	count	N/A	instance_id,gpu	1 minute
gpu_nvlink_number	(Agent) GPU NVLinks	Number of NVLinks of the GPU. For example, A100 supports 12 NVLinks. Linux: Obtain the metric value by calling the nvmlDeviceGetFieldValue API provided by the GPU driver library libnvidia-ml.so.1. Windows is not supported currently.	≥ 0	count	N/A	instance_id,gpu	1 minute
gpu_nvlink_bandwidth	(Agent) Average GPU NVLink Bandwidth	Average NVLink bandwidth of the GPU The value is the total bandwidth for GPU data transmission. Linux: Obtain the metric value by calling the nvmlDeviceGetFieldValue API provided by the GPU driver library libnvidia-ml.so.1. Windows is not supported currently.	≥ 0	GB/s	N/A	instance_id,gpu	1 minute

OS Metrics: NPU

**Table 11** NPU metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Dimension	Monitoring Interval (Raw Data)
npu_device_health	(Agent) NPU Device Health	NPU health status	0: healthy 1: minor alarm 2: major alarm 3: critical alarm	N/A	N/A	instance_id,npu	1 minute
npu_driver_health	(Agent) NPU Driver Health	Health status of the NPU driver	0: healthy 1: minor alarm 2: major alarm 3: critical alarm	N/A	N/A	instance_id,npu	1 minute
npu_power	(Agent) NPU Power	NPU power	>0	W	N/A	instance_id,npu	1 minute
npu_temperature	(Agent) NPU Temperature	NPU temperature	Natural numbers	°C	N/A	instance_id,npu	1 minute
npu_voltage	(Agent) NPU Voltage	NPU voltage	Natural numbers	V	N/A	instance_id,npu	1 minute
npu_util_rate_hbm	(Agent) NPU HBM Usage	NPU HBM usage	0-100	%	N/A	instance_id,npu	1 minute
npu_hbm_freq	(Agent) NPU HBM Frequency	NPU HBM frequency	>0	MHz	N/A	instance_id,npu	1 minute
npu_freq_hbm	(Agent) NPU HBM Frequency	NPU HBM frequency	>0	MHz	N/A	instance_id,npu	1 minute
npu_hbm_usage	(Agent) Used HBM	Used NPU HBM	≥0	MB	N/A	instance_id,npu	1 minute
npu_hbm_temperature	(Agent) HBM Temperature	NPU HBM temperature	Natural numbers	°C	N/A	instance_id,npu	1 minute
npu_hbm_bandwidth_util	(Agent) HBM Bandwidth Usage	NPU HBM bandwidth usage	0-100	%	N/A	instance_id,npu	1 minute
npu_hbm_mem_capacity	(Agent) HBM Memory Capacity	NPU HBM memory capacity	≥0	MB	N/A	instance_id,npu	1 minute
npu_hbm_ecc_enable	(Agent) HBM ECC Check Status	Whether HBM ECC check is enabled for the NPU	0: disabled 1: enabled	N/A	N/A	instance_id,npu	1 minute
npu_hbm_single_bit_error_cnt	(Agent) HBM Single-Bit Errors	Number of HBM single-bit errors of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_hbm_double_bit_error_cnt	(Agent) HBM Double-Bit Errors	Number of HBM double-bit errors of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_hbm_total_single_bit_error_cnt	(Agent) Single-Bit Errors in HBM Lifecycle	Number of single-bit errors in an NPU HBM lifecycle	≥0	count	N/A	instance_id,npu	1 minute
npu_hbm_total_double_bit_error_cnt	(Agent) Double-Bit Errors in HBM Lifecycle	Number of double-bit errors in an NPU HBM lifecycle	≥0	count	N/A	instance_id,npu	1 minute
npu_hbm_single_bit_isolated_pages_cnt	(Agent) Isolated Memory Pages with HBM Single-Bit Errors	Number of isolated memory pages with single-bit HBM errors of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_hbm_double_bit_isolated_pages_cnt	(Agent) Isolated Memory Pages with HBM Double-Bit Errors	Number of isolated memory pages with double-bit HBM errors of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_usage_mem	(Agent) Used NPU Memory	Memory used on the NPU	≥0	MB	N/A	instance_id,npu	1 minute
npu_util_rate_mem	(Agent) NPU Memory Usage	NPU memory usage	0-100	%	N/A	instance_id,npu	1 minute
npu_util_rate_hbm_bw	(Agent) NPU HBM Bandwidth Usage	NPU HBM bandwidth usage	0-100	%	N/A	instance_id,npu	1 minute
npu_freq_mem	(Agent) NPU Memory Frequency	NPU memory frequency	>0	MHz	N/A	instance_id,npu	1 minute
npu_util_rate_mem_bandwidth	(Agent) NPU Memory Bandwidth Usage	NPU memory bandwidth usage	0-100	%	N/A	instance_id,npu	1 minute
npu_util_rate_vector_core	(Agent) NPU Vector Core Usage	Vector core usage of the NPU	0-100	%	N/A	instance_id,npu	1 minute
npu_sbe	(Agent) NPU Single-Bit Errors	Number of single-bit errors on the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_dbe	(Agent) NPU Double-Bit Errors	Number of dual-bit errors on the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_freq_ai_core	(Agent) NPU AI Core Frequency	AI core clock frequency of the NPU	>0	MHz	N/A	instance_id,npu	1 minute
npu_freq_ai_core_rated	(Agent) Rated NPU AI Core Frequency	Rated AI core frequency of the NPU	>0	MHz	N/A	instance_id,npu	1 minute
npu_util_rate_ai_core	(Agent) NPU AI Core Usage	AI core usage of the NPU	0-100	%	N/A	instance_id,npu	1 minute
npu_aicpu_num	(Agent) NPU AI CPUs	Number of AI CPUs on the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_util_rate_ai_cpu	(Agent) NPU AI CPU Usage	AI CPU usage of the NPU	0-100	%	N/A	instance_id,npu	1 minute
npu_aicpu_avg_util_rate	(Agent) Average AI CPU Usage of NPU	Average AI CPU usage of the NPU	0-100	%	N/A	instance_id,npu	1 minute
npu_aicpu_max_freq	(Agent) Max. AI CPU Frequency of NPU	Maximum AI CPU frequency of the NPU	>0	MHz	N/A	instance_id,npu	1 minute
npu_aicpu_cur_freq	(Agent) AI CPU Frequency of NPU	AI CPU frequency of the NPU	>0	MHz	N/A	instance_id,npu	1 minute
npu_util_rate_ctrl_cpu	(Agent) NPU Control CPU Usage	CPU usage controlled by the NPU	0-100	%	N/A	instance_id,npu	1 minute
npu_freq_ctrl_cpu	(Agent) NPU Control CPU Frequency	CPU frequency controlled by the NPU	>0	MHz	N/A	instance_id,npu	1 minute
npu_link_cap_speed	(Agent) Max. NPU Link Speed	Maximum link speed of the NPU	≥0	GT/s	N/A	instance_id,npu	1 minute
npu_link_cap_width	(Agent) Max. NPU Link Width	Maximum link width of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_link_status_speed	(Agent) NPU Link Speed	Link speed of the NPU	≥0	GT/s	N/A	instance_id,npu	1 minute
npu_link_status_width	(Agent) NPU Link Width	Link width of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_device_network_health	(Agent) NPU Network Health	RoCE IP address connectivity of the NPU	0: The network is healthy. Other values: The network status is unhealthy.	N/A	N/A	instance_id,npu	1 minute
npu_network_port_link_status	(Agent) NPU Network Port Link Status	Link status of the network port on the NPU	0: up 1: down	N/A	N/A	instance_id,npu	1 minute
npu_roce_tx_rate	(Agent) NPU NIC Uplink Rate	NIC uplink rate of the NPU	≥0	MB/s	N/A	instance_id,npu	1 minute
npu_roce_rx_rate	(Agent) NPU NIC Downlink Rate	NIC downlink rate of the NPU	≥0	MB/s	N/A	instance_id,npu	1 minute
npu_mac_tx_mac_pause_num	(Agent) Pause Frames Sent by MAC	Total number of pause frames sent by the MAC address of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_mac_rx_mac_pause_num	(Agent) Pause Frames Received by MAC	Total number of pause frames received by the MAC address of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_mac_tx_pfc_pkt_num	(Agent) PFC Frames Sent by MAC	Total number of PFC frames sent by the MAC address of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_mac_rx_pfc_pkt_num	(Agent) PFC Frames Received by MAC	Total number of PFC frames received by the MAC address of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_mac_tx_bad_pkt_num	(Agent) Bad Packets Sent by MAC	Total number of bad packets sent by the MAC address of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_mac_rx_bad_pkt_num	(Agent) Bad Packets Received by MAC	Total number of bad packets received by the MAC address of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_tx_err_pkt_num	(Agent) Bad Packets Sent by RoCE	Total number of bad packets sent by the RoCE NIC of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_rx_err_pkt_num	(Agent) Bad Packets Received by RoCE	Total number of bad packets received by the RoCE NIC of the NPU	≥0	count	N/A	instance_id,npu	1 minute
npu_opt_temperature	(Agent) NPU Optical Module Case Temperature	Case temperature of the NPU optical module	Natural numbers	°C	N/A	instance_id,npu	1 minute
npu_opt_temperature_high_thres	(Agent) Max. NPU Optical Module Case Temperature	Upper limit for the case temperature of the NPU optical module	Natural numbers	°C	N/A	instance_id,npu	1 minute
npu_opt_temperature_low_thres	(Agent) Min. NPU Optical Module Case Temperature	Lower limit for the case temperature of the NPU optical module	Natural numbers	°C	N/A	instance_id,npu	1 minute
npu_opt_voltage	(Agent) NPU Optical Module Voltage	Voltage of the NPU optical module	Natural numbers	mV	N/A	instance_id,npu	1 minute
npu_opt_voltage_high_thres	(Agent) Max. NPU Optical Module Voltage	Upper limit for the voltage of the NPU optical module	Natural numbers	mV	N/A	instance_id,npu	1 minute
npu_opt_voltage_low_thres	(Agent) Min. NPU Optical Module Voltage	Lower limit for the voltage of the NPU optical module	Natural numbers	mV	N/A	instance_id,npu	1 minute
npu_opt_tx_power_lane0	(Agent) NPU Optical Module Lane 0 TX Power	Transmit power of NPU optical module lane 0	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_tx_power_lane1	(Agent) NPU Optical Module Lane 1 TX Power	Transmit power of NPU optical module lane 1	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_tx_power_lane2	(Agent) NPU Optical Module Lane 2 TX Power	Transmit power of NPU optical module lane 2	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_tx_power_lane3	(Agent) NPU Optical Module Lane 3 TX Power	Transmit power of NPU optical module lane 3	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_rx_power_lane0	(Agent) NPU Optical Module Lane 0 RX Power	Receive power of NPU optical module lane 0	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_rx_power_lane1	(Agent) NPU Optical Module Lane 1 RX Power	Receive power of NPU optical module lane 1	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_rx_power_lane2	(Agent) NPU Optical Module Lane 2 RX Power	Receive power of NPU optical module lane 2	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_rx_power_lane3	(Agent) NPU Optical Module Lane 3 RX Power	Receive power of NPU optical module lane 3	≥0	mW	N/A	instance_id,npu	1 minute
npu_opt_tx_bias_lane0	(Agent) NPU Optical Module Lane 0 TX Bias Current	Transmit bias current of NPU optical module lane 0	≥0	mA	N/A	instance_id,npu	1 minute
npu_opt_tx_bias_lane1	(Agent) NPU Optical Module Lane 1 TX Bias Current	Transmit bias current of NPU optical module lane 1	≥0	mA	N/A	instance_id,npu	1 minute
npu_opt_tx_bias_lane2	(Agent) NPU Optical Module Lane 2 TX Bias Current	Transmit bias current of NPU optical module lane 2	≥0	mA	N/A	instance_id,npu	1 minute
npu_opt_tx_bias_lane3	(Agent) NPU Optical Module Lane 3 TX Bias Current	Transmit bias current of NPU optical module lane 3	≥0	mA	N/A	instance_id,npu	1 minute
npu_opt_tx_los	(Agent) NPU Optical Module TX LOS	Statistics on Transmit LOS Flag of the NPU optical module	≥0	count	N/A	instance_id,npu	1 minute
npu_opt_rx_los	(Agent) NPU Optical Module RX LOS	Statistics on Receive LOS Flag of the NPU optical module	≥0	count	N/A	instance_id,npu	1 minute
npu_macro1_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro1 0lane	Maximum duration of NPU Macro1 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro1_0lane_total_sec	(Agent) Total Duration of NPU Macro1 0lane	Total duration of NPU Macro1 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro1_crc_error_cnt	(Agent) Error Packets Received by NPU Macro1	Number of CRC error packets received by NPU Macro1 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro1_crc_error_rate	(Agent) NPU Macro1 BER	Percentage of CRC error packets received by NPU Macro1 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro1_retry_cnt	(Agent) Packets Retransmitted by NPU Macro1	Number of packets retransmitted by NPU Macro1 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro1_rx_cnt	(Agent) Packets Received by NPU Macro1	Number of packets received by NPU Macro1 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro1_serdes_lane0_snr	(Agent) NPU Macro1 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro1_serdes_lane1_snr	(Agent) NPU Macro1 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro1_serdes_lane2_snr	(Agent) NPU Macro1 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro1_serdes_lane3_snr	(Agent) NPU Macro1 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro1_tx_cnt	(Agent) Packets Sent by NPU Macro1	Number of packets sent by NPU Macro1 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro2_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro2 0lane	Maximum duration of NPU Macro2 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro2_0lane_total_sec	(Agent) Total Duration of NPU Macro2 0lane	Total duration of NPU Macro2 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro2_crc_error_cnt	(Agent) Error Packets Received by NPU Macro2	Number of CRC error packets received by NPU Macro2 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro2_crc_error_rate	(Agent) NPU Macro2 BER	Percentage of CRC error packets received by NPU Macro2 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro2_retry_cnt	(Agent) Packets Retransmitted by NPU Macro2	Number of packets retransmitted by NPU Macro2 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro2_rx_cnt	(Agent) Packets Received by NPU Macro2	Number of packets received by NPU Macro2 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro2_serdes_lane0_snr	(Agent) NPU Macro2 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro2_serdes_lane1_snr	(Agent) NPU Macro2 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro2_serdes_lane2_snr	(Agent) NPU Macro2 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro2_serdes_lane3_snr	(Agent) NPU Macro2 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro2_tx_cnt	(Agent) Packets Sent by NPU Macro2	Number of packets sent by NPU Macro2 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro3_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro3 0lane	Maximum duration of NPU Macro3 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro3_0lane_total_sec	(Agent) Total Duration of NPU Macro3 0lane	Total duration of NPU Macro3 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro3_crc_error_cnt	(Agent) Error Packets Received by NPU Macro3	Number of CRC error packets received by NPU Macro3 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro3_crc_error_rate	(Agent) NPU Macro3 BER	Percentage of CRC error packets received by NPU Macro3 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro3_retry_cnt	(Agent) Packets Retransmitted by NPU Macro3	Number of packets retransmitted by NPU Macro3 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro3_rx_cnt	(Agent) Packets Received by NPU Macro3	Number of packets received by NPU Macro3 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro3_serdes_lane0_snr	(Agent) NPU Macro3 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro3_serdes_lane1_snr	(Agent) NPU Macro3 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro3_serdes_lane2_snr	(Agent) NPU Macro3 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro3_serdes_lane3_snr	(Agent) NPU Macro3 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro3_tx_cnt	(Agent) Packets Sent by NPU Macro3	Number of packets sent by NPU Macro3 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro4_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro4 0lane	Maximum duration of NPU Macro4 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro4_0lane_total_sec	(Agent) Total Duration of NPU Macro4 0lane	Total duration of NPU Macro4 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro4_crc_error_cnt	(Agent) Error Packets Received by NPU Macro4	Number of CRC error packets received by NPU Macro4 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro4_crc_error_rate	(Agent) NPU Macro4 BER	Percentage of CRC error packets received by NPU Macro4 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro4_retry_cnt	(Agent) Packets Retransmitted by NPU Macro4	Number of packets retransmitted by NPU Macro4 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro4_rx_cnt	(Agent) Packets Received by NPU Macro4	Number of packets received by NPU Macro4 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro4_serdes_lane0_snr	(Agent) NPU Macro4 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro4_serdes_lane1_snr	(Agent) NPU Macro4 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro4_serdes_lane2_snr	(Agent) NPU Macro4 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro4_serdes_lane3_snr	(Agent) NPU Macro4 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro4_tx_cnt	(Agent) Packets Sent by NPU Macro4	Number of packets sent by NPU Macro4 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro5_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro5 0lane	Maximum duration of NPU Macro5 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro5_0lane_total_sec	(Agent) Total Duration of NPU Macro5 0lane	Total duration of NPU Macro5 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro5_crc_error_cnt	(Agent) Error Packets Received by NPU Macro5	Number of CRC error packets received by NPU Macro5 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro5_crc_error_rate	(Agent) NPU Macro5 BER	Percentage of CRC error packets received by NPU Macro5 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro5_retry_cnt	(Agent) Packets Retransmitted by NPU Macro5	Number of packets retransmitted by NPU Macro5 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro5_rx_cnt	(Agent) Packets Received by NPU Macro5	Number of packets received by NPU Macro5 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro5_serdes_lane0_snr	(Agent) NPU Macro5 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro5_serdes_lane1_snr	(Agent) NPU Macro5 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro5_serdes_lane2_snr	(Agent) NPU Macro5 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro5_serdes_lane3_snr	(Agent) NPU Macro5 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro5_tx_cnt	(Agent) Packets Sent by NPU Macro5	Number of packets sent by NPU Macro5 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro6_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro6 0lane	Maximum duration of NPU Macro6 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro6_0lane_total_sec	(Agent) Total Duration of NPU Macro6 0lane	Total duration of NPU Macro6 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro6_crc_error_cnt	(Agent) Error Packets Received by NPU Macro6	Number of CRC error packets received by NPU Macro6 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro6_crc_error_rate	(Agent) NPU Macro6 BER	Percentage of CRC error packets received by NPU Macro6 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro6_retry_cnt	(Agent) Packets Retransmitted by NPU Macro6	Number of packets retransmitted by NPU Macro6 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro6_rx_cnt	(Agent) Packets Received by NPU Macro6	Number of packets received by NPU Macro6 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro6_serdes_lane0_snr	(Agent) NPU Macro6 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro6_serdes_lane1_snr	(Agent) NPU Macro6 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro6_serdes_lane2_snr	(Agent) NPU Macro6 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro6_serdes_lane3_snr	(Agent) NPU Macro6 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro6_tx_cnt	(Agent) Packets Sent by NPU Macro6	Number of packets sent by NPU Macro6 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro7_0lane_max_consec_sec	(Agent) Max. Duration of NPU Macro7 0lane	Maximum duration of NPU Macro7 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro7_0lane_total_sec	(Agent) Total Duration of NPU Macro7 0lane	Total duration of NPU Macro7 0lane in a monitoring period	≥0	s	N/A	instance_id,npu	1 minute
npu_macro7_crc_error_cnt	(Agent) Error Packets Received by NPU Macro7	Number of CRC error packets received by NPU Macro7 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro7_crc_error_rate	(Agent) NPU Macro7 BER	Percentage of CRC error packets received by NPU Macro7 in a monitoring period	0-100	%	N/A	instance_id,npu	1 minute
npu_macro7_retry_cnt	(Agent) Packets Retransmitted by NPU Macro7	Number of packets retransmitted by NPU Macro7 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro7_rx_cnt	(Agent) Packets Received by NPU Macro7	Number of packets received by NPU Macro7 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_macro7_serdes_lane0_snr	(Agent) NPU Macro7 SerDes Lane0 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane0	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro7_serdes_lane1_snr	(Agent) NPU Macro7 SerDes Lane1 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane1	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro7_serdes_lane2_snr	(Agent) NPU Macro7 SerDes Lane2 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane2	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro7_serdes_lane3_snr	(Agent) NPU Macro7 SerDes Lane3 SNR	Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane3	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_macro7_tx_cnt	(Agent) Packets Sent by NPU Macro7	Number of packets sent by NPU Macro7 in a monitoring period	≥0	count	N/A	instance_id,npu	1 minute
npu_opt_media_snr_lane0	(Agent) NPU Optical Module Lane 0 Optical SNR	Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 0 in the NPU optical module	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_opt_media_snr_lane1	(Agent) NPU Optical Module Lane 1 Optical SNR	Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 1 in the NPU optical module	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_opt_media_snr_lane2	(Agent) NPU Optical Module Lane 2 Optical SNR	Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 2 in the NPU optical module	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_opt_media_snr_lane3	(Agent) NPU Optical Module Lane 3 Optical SNR	Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 3 in the NPU optical module	Natural numbers	db	N/A	instance_id,npu	1 minute
npu_roce_new_pkt_rty_num	(Agent) Packets Retransmitted by NPU RoCE	Number of packets retransmitted by NPU RoCE	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_out_of_order_num	(Agent) PSN Error Packets Received by NPU RoCE	Number of NPU RoCE packets with a PSN greater than the expected one or duplicating with an existing one If packets are out of order or lost, retransmission will be triggered.	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_rx_all_pkt_num	(Agent) Packets Received by NPU RoCE	Total number of packets received by NPU RoCE	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_rx_cnp_pkt_num	(Agent) CNP Packets Received by NPU RoCE	Total number of CNP packets received by NPU RoCE	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_tx_all_pkt_num	(Agent) Packets Sent by NPU RoCE	Total number of packets sent by NPU RoCE	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_tx_cnp_pkt_num	(Agent) CNP Packets Sent by NPU RoCE	Total number of CNP packets sent by NPU RoCE	≥0	count	N/A	instance_id,npu	1 minute
npu_roce_tx_err_pkt_num	(Agent) Bad Packets Sent by RoCE	Total number of bad packets sent by the RoCE NIC of the NPU for reference	≥0	count	N/A	instance_id,npu	1 minute

If an object is in a hierarchical system, specify the monitored dimension in hierarchical form when you use APIs to query metrics of this object.

For example, to query the available space (metric: disk_free) of a disk mount point on a BMS, the dimension of the metric is instance_id,mount_point, where instance_id indicates level 0 and mount_point indicates level 1.

To query a single metric by calling an API, the mount_point dimension is used as follows:
```
dim.0=instance_id,3d65c1ac-9a9f-4c5f-a054-35184a087bb2&dim.1=mount_point,6666cd76f96956469e7be39d750cc7d9
```
3d65c1ac-9a9f-4c5f-a054-35184a087bb2 and 6666cd76f96956469e7be39d750cc7d9 are the values of instance_id and mount_point, respectively. For details about how to obtain the values, see Dimensions.

To query multiple metrics by calling an API, the mount_point dimension is used as follows:

"dimensions": [ 
                 { 
                     "name": "instance_id", 
                     "value": "3d65c1ac-9a9f-4c5f-a054-35184a087bb2"    
                 }, 
                 { 
                     "name": "mount_point", 
                     "value": "6666cd76f96956469e7be39d750cc7d9" 
                 } 
             ]

3d65c1ac-9a9f-4c5f-a054-35184a087bb2 and 6666cd76f96956469e7be39d750cc7d9 are the values of instance_id and mount_point, respectively. For details about how to obtain the values, see Dimensions.

Dimensions

Dimension	Key	Value
Cloud server	instance_id	Cloud server
Server process	proc	Process
Cloud server disk	disk	Disk
Cloud server mount point	mount_point	Mount point
Cloud server GPU	gpu	GPU
Cloud server NPU	npu	NPU
Cloud server NIC	network_interface_card	NIC
Cloud server GPU	gpu_slot	GPU
GPU process ID of cloud server	pid_for_gpu	GPU process ID

Parent topic: Cloud Eye Monitoring

Previous topic: Overview

Next topic: Monitored BMS Events

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot