Help Center/ Bare Metal Server/ User Guide/ Cloud Eye Monitoring/ Monitored Metrics (with Agent Installed)
Updated on 2025-11-21 GMT+08:00

Monitored Metrics (with Agent Installed)

Description

This section describes monitoring metrics reported by BMS to Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for BMS.

Cloud Eye can monitor dimensions nested to a maximum depth of four levels (levels 0 to 3). 3 is the deepest level. For example, if the monitored dimension of a metric is instance_id,mount_point, instance_id indicates level 0 and mount_point indicates level 1.

Prerequisites

The Agent has been installed. For details, see Installing the Agent.

Namespace

SERVICE.BMS

OS Metrics: CPU

Table 1 CPU metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

cpu_usage

(Agent) CPU Usage

CPU usage of the monitored object

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) value.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

%

N/A

instance_id

1 minute

cpu_usage_idle

(Agent) Idle CPU Usage

Percentage of time that CPU is idle

  • Linux: Check metric value changes in file /proc/stat in a collection period.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

%

N/A

instance_id

1 minute

cpu_usage_other

(Agent) Other Process CPU Usage

Percentage of time that the CPU is used by other processes

  • Linux: Other Process CPU Usage = 1 – Idle CPU Usage (%) – Kernel Space CPU Usage (%) – User Space CPU Usage (%)
  • Windows: Other Process CPU Usage = 1 – Idle CPU Usage (%) – Kernel Space CPU Usage (%) – User Space CPU Usage (%)

0-100

%

N/A

instance_id

1 minute

cpu_usage_system

(Agent) Kernel Space CPU Usage

Percentage of time that the CPU is used by kernel space

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) sy value.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

%

N/A

instance_id

1 minute

cpu_usage_user

(Agent) User Space CPU Usage

Percentage of time that the CPU is used by user space

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) us value.
  • Windows: Obtain the metric value using the Windows API GetSystemTimes.

0-100

%

N/A

instance_id

1 minute

cpu_usage_nice

(Agent) Nice Process CPU Usage

Percentage of time that the CPU is used by the Nice process

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) ni value.
  • Windows is not supported currently.

0-100

%

N/A

instance_id

1 minute

cpu_usage_iowait

(Agent) iowait Process CPU Usage

Percentage of time during which the CPU is waiting for I/O operations to complete

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) wa value.
  • Windows is not supported currently.

0-100

%

N/A

instance_id

1 minute

cpu_usage_irq

(Agent) CPU Interrupt Time

Percentage of time that the CPU is servicing interrupts

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) hi value.
  • Windows is not supported currently.

0-100

%

N/A

instance_id

1 minute

cpu_usage_softirq

(Agent) CPU Software Interrupt Time

Percentage of time that the CPU is servicing software interrupts

  • Linux: Check metric value changes in file /proc/stat in a collection period. Run the top command and check the %Cpu(s) si value.
  • Windows is not supported currently.

0-100

%

N/A

instance_id

1 minute

OS Metrics: CPU Load

Table 2 CPU load metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

load_average1

(Agent) 1-Minute Load Average

CPU load averaged from the last 1 minute

Linux: Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. Run the top command and check the load1 value.

≥0

N/A

N/A

instance_id

1 minute

load_average5

(Agent) 5-Minute Load Average

CPU load averaged from the last 5 minutes

Linux: Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. Run the top command and check the load5 value.

≥0

N/A

N/A

instance_id

1 minute

load_average15

(Agent) 15-Minute Load Average

CPU load averaged from the last 15 minutes

Linux: Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. Run the top command and check the load15 value.

≥0

N/A

N/A

instance_id

1 minute

OS Metrics: Memory

Table 3 Memory metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

mem_available

(Agent) Available Memory

Available memory of the monitored object

  • Linux: Obtain the metric value from /proc/meminfo.
    • If MemAvailable is displayed in /proc/meminfo, obtain the value.
    • If MemAvailable is not displayed in /proc/meminfo, calculate the value with the formula MemAvailable = MemFree + Buffers + Cached.
  • Windows: Available memory = Total memory – Used memory. Obtain the metric value using the Windows API GlobalMemoryStatusEx.

≥0

GB

N/A

instance_id

1 minute

mem_usedPercent

(Agent) Memory Usage

Memory usage of the monitored object

  • Linux: Obtain the metric value from the /proc/meminfo file. Memory Usage = (MemTotalMemAvailable)/MemTotal
    • If MemAvailable is displayed in /proc/meminfo, calculate the value with the formula MemUsedPercent = (MemTotal – MemAvailable)/MemTotal.
    • If MemAvailable is not displayed in /proc/meminfo, calculate the value with the formula MemUsedPercent = (MemTotalMemFreeBuffersCached)/MemTotal.
  • Windows: Memory Usage = Used memory/Total memory x 100%

0-100

%

N/A

instance_id

1 minute

mem_free

(Agent) Idle Memory

Memory that is not being used

  • Linux: Obtain the metric value from /proc/meminfo.
  • Windows is not supported currently.

≥0

GB

N/A

instance_id

1 minute

mem_buffers

(Agent) Buffer

Memory that is being used for buffers

  • Linux: Obtain the metric value from /proc/meminfo. Run the top command and check the KiB Mem:buffers value.
  • Windows is not supported currently.

≥0

GB

N/A

instance_id

1 minute

mem_cached

(Agent) Cache

Memory that is being used for caches

  • Linux: Obtain the metric value from /proc/meminfo. Run the top command and check the KiB Swap:cached Mem value.
  • Windows is not supported currently.

≥0

GB

N/A

instance_id

1 minute

total_open_files

(Agent) Total File Handles

Total handles used by all processes

  • Linux: Use the /proc/{pid}/fd file to summarize the handles used by all processes.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

OS Metrics: Disk

  • Currently, Cloud Eye Agent only monitors physical disks. NFS-mounted disks cannot be monitored.
  • By default, Cloud Eye Agent excludes Docker-related mount points. The mount point prefixes are as follows:
    /var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos
Table 4 Disk metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

disk_free

(Agent) Available Disk Space

Available disk space of the monitored object

  • Linux: Run the df -h command and check the value in the Avail column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥0

GB

N/A

instance_id,mount_point

1 minute

disk_total

(Agent) Disk Storage Capacity

Total disk capacity of the monitored object

  • Linux: Run the df -h command and check the value in the Size column.

    The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

  • Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥0

GB

N/A

instance_id,mount_point

1 minute

disk_used

(Agent) Used Disk Space

Used disk space of the monitored object

  • Linux: Run the df -h command and check the value in the Used column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

≥0

GB

N/A

instance_id,mount_point

1 minute

disk_usedPercent

(Agent) Disk Usage

Disk usage of the monitored object

Formula: Disk Usage = Used Disk Space/Disk Storage Capacity

  • Linux: Obtain the metric value using the following formula: Disk Usage = Used Disk Space/Disk Storage Capacity. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows: Obtain the metric value using the WMI API GetDiskFreeSpaceExW. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).

0-100

%

N/A

instance_id,mount_point

1 minute

OS Metrics: Disk I/O

Table 5 Disk I/O metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

disk_agt_read_bytes_rate

(Agent) Disks Read Rate

Number of bytes read from the monitored disk per second

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the sixth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows:
    • Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • High CPU usage may lead to timeouts in monitoring data collection.

≥ 0

byte/s

1024(IEC)

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_agt_read_requests_rate

(Agent) Disks Read Requests

Number of requests to read data from the monitored disk per second

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the fourth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows:
    • Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • High CPU usage may lead to timeouts in monitoring data collection.

≥ 0

request/s

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_agt_write_bytes_rate

(Agent) Disks Write Rate

Number of bytes written into the monitored disk per second

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the tenth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows:
    • Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • High CPU usage may lead to timeouts in monitoring data collection.

≥ 0

byte/s

1024(IEC)

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_agt_write_requests_rate

(Agent) Disks Write Requests

Number of requests to write data into the monitored disk per second

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the eighth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows:
    • Obtain the metric value using the WMI object Win32_PerfFormattedData_PerfDisk_LogicalDisk.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
    • High CPU usage may lead to timeouts in monitoring data collection.

≥ 0

request/s

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_readTime

(Agent) Average Read Request Time

Average amount of time that read requests have waited on the disks

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the seventh column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

ms/count

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_writeTime

(Agent) Average Write Request Time

Average amount of time that write requests have waited on the disks

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the eleventh column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

ms/count

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_ioUtils

(Agent) Disk I/O Usage

Disk I/O usage of the monitored object

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the thirteenth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

0-100

%

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_queue_length

(Agent) Disk Queue Length

Average number of read or write requests waiting to be processed by the monitored disk in a monitoring period

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value based on data changes in the fourteenth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

Count

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_write_bytes_per_operation

(Agent) Average Disk Write Size

Average number of bytes written into the monitored disk per write I/O in a monitoring period

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value by dividing data changes in the tenth column by that in the eighth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

Byte/op

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_read_bytes_per_operation

(Agent) Average Disk Read Size

Average number of bytes read from the monitored disk per read I/O in a monitoring period

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value by dividing data changes in the sixth column by that in the fourth column in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

Byte/op

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_io_svctm

(Agent) Disk I/O Service Time

Average time the monitored disk takes to complete an I/O request (read or write) in a monitoring period

  • Linux:
    • In file /proc/diskstats, locate the device and calculate the metric value by dividing the data changes in the thirteenth column by the sum of data changes in the fourth and eighth columns in a collection period.
    • The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

ms/op

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

disk_device_used_percent

(Agent) Block Device Usage

Physical disk usage of the monitored object

Formula: Block device usage = Storage space used by all mounted disk partitions/Total disk storage space

  • Linux: Calculate the sum of storage space used by all mount points. Calculate the total disk storage space based on the disk sector size and the number of sectors. Then, calculate the block device usage based on the formula mentioned above.
  • Windows is not supported currently.

0-100

%

N/A

  • instance_id,mount_point
  • instance_id,disk

1 minute

OS Metrics: File System

Table 6 File system metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

disk_fs_rwstate

(Agent) File System Read/Write Status

File system read/write status of the monitored object Possible values are 0 (read and write) and 1 (read-only).

  • Linux: Obtain the file system status from the fourth column in the /proc/mounts file.
  • Windows is not supported currently.
  • 0: read and write
  • 1: read-only

N/A

N/A

instance_id,mount_point

1 minute

disk_inodesTotal

(Agent) Disk inode Total

Total number of index nodes on the disk

  • Linux: Run the df -i command and check the value in the Inodes column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

N/A

N/A

instance_id,mount_point

1 minute

disk_inodesUsed

(Agent) Total inode Used

Number of used index nodes on the disk

  • Linux: Run the df -i command and check the value in the IUsed column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

≥ 0

N/A

N/A

instance_id,mount_point

1 minute

disk_inodesUsedPercent

(Agent) Percentage of Total inode Used

Percentage of used index nodes on the disk

  • Linux: Run the df -i command and check the value in the IUse% column. The mount point prefix cannot exceed 64 characters. It must start with a letter and contain only digits, letters, hyphens (-), periods (.), and swung dashes (~).
  • Windows is not supported currently.

0-100

%

N/A

instance_id,mount_point

1 minute

OS Metrics: TCP

Table 7 TCP metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

net_tcp_total

(Agent) Total TCP Connections

Total number of TCP connections in all states

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_established

(Agent) TCP ESTABLISHED Connections

Number of TCP connections in ESTABLISHED state

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_sys_sent

(Agent) TCP SYS_SENT Connections

Number of TCP connections that are being requested by the client

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_sys_recv

(Agent) TCP SYS_RECV Connections

Number of pending TCP connections received by the server

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_fin_wait1

(Agent) TCP FIN_WAIT1 Connections

Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_fin_wait2

(Agent) TCP FIN_WAIT2 Connections

Number of TCP connections in FIN_WAIT2 state

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_time_wait

(Agent) TCP TIME_WAIT Connections

Number of TCP connections in TIME_WAIT state

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_close

(Agent) TCP CLOSE Connections

Number of closed TCP connections

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_close_wait

(Agent) TCP CLOSE_WAIT Connections

Number of TCP connections in CLOSE_WAIT TCP state

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_last_ack

(Agent) TCP LAST_ACK Connections

Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_listen

(Agent) TCP LISTEN Connections

Number of TCP connections in LISTEN state

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_closing

(Agent) TCP CLOSING Connections

Number of TCP connections to be actively closed by the server and the client at the same time

  • Linux: Obtain TCP connections in all states from the /proc/net/tcp file, and then collect the number of connections in each state.
  • Windows: Obtain the metric value using the Windows API GetTcpTable2.

≥ 0

Count

N/A

instance_id

1 minute

net_tcp_retrans

(Agent) TCP Retransmission Rate

Percentage of packets that are resent

  • Linux: Obtain the metric value from the /proc/net/snmp file. The value is the ratio of the number of retransmitted packets to the number of total packets sent in a collection period.
  • Windows: Obtain the metric value using the Windows API GetTcpStatistics.

0-100

%

N/A

instance_id

1 minute

OS Metrics: NIC

Table 8 NIC metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

net_bitRecv

(Agent) Outbound Bandwidth

Number of bits sent by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Obtain the metric value using the WMI object MibIfRow.

≥ 0

bit/s

1024(IEC)

  • instance_id
  • instance_id,network_interface_card

1 minute

net_bitSent

(Agent) Inbound Bandwidth

Number of bits received by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Obtain the metric value using the WMI object MibIfRow.

≥ 0

bit/s

1024(IEC)

  • instance_id
  • instance_id,network_interface_card

1 minute

net_packetRecv

(Agent) NIC Packet Receive Rate

Number of packets received by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Obtain the metric value using the WMI object MibIfRow.

≥ 0

Count/s

N/A

  • instance_id
  • instance_id,network_interface_card

1 minute

net_packetSent

(Agent) NIC Packet Send Rate

Number of packets sent by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows: Obtain the metric value using the WMI object MibIfRow.

≥ 0

Count/s

N/A

  • instance_id
  • instance_id,network_interface_card

1 minute

net_errin

(Agent) Receive Error Rate

Percentage of error packets relative to the total packets received by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

%

N/A

  • instance_id
  • instance_id,network_interface_card

1 minute

net_errout

(Agent) Transmit Error Rate

Percentage of error packets relative to the total packets sent by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

%

N/A

  • instance_id
  • instance_id,network_interface_card

1 minute

net_dropin

(Agent) Received Packet Drop Rate

Percentage of received but dropped packets relative to the total packets received by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

%

N/A

  • instance_id
  • instance_id,network_interface_card

1 minute

net_dropout

(Agent) Transmitted Packet Drop Rate

Percentage of sent but dropped packets relative to the total packets sent by the monitored object per second

  • Linux: Check metric value changes in file /proc/net/dev in a collection period.
  • Windows is not supported currently.

0-100

%

N/A

  • instance_id
  • instance_id,network_interface_card

1 minute

Process Monitoring Metrics

Table 9 Process Metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

proc_pHashId_cpu

(Agent) CPU Usage

CPU consumed by a process. pHashId is the MD5 value of the process name plus process ID.

  • Linux: Check metric value changes in file /proc/pid/stat.
  • Windows: Obtain the metric value using the Windows API GetProcessTimes.

0–1 x Number of CPU cores

%

N/A

instance_id

1 minute

proc_pHashId_mem

(Agent) Memory Usage

Memory consumed by a process. pHashId is the MD5 value of the process name plus process ID.

  • Linux:

    RSS*PAGESIZE/MemTotal

    Obtain the RSS value by checking the second column of the file /proc/pid/statm.

    Obtain the PAGESIZE value by running the getconf PAGESIZE command.

    Obtain the MemTotal value by checking the file /proc/meminfo.

  • Windows: Call the Windows API procGlobalMemoryStatusEx to obtain the total memory size. Call GetProcessMemoryInfo to obtain the used memory size. Divide the total size by the used size to get the memory usage.

0-100

%

N/A

instance_id

1 minute

proc_pHashId_file

(Agent) Opened Files

Number of files opened by a process. pHashId is the MD5 value of the process name plus process ID.

  • Linux: Run the ls -l /proc/pid/fd command to check the number of opened files.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

proc_running_count

(Agent) Running Processes

Number of running processes of the monitored object

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

proc_idle_count

(Agent) Idle Processes

Number of idle processes of the monitored object

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

proc_zombie_count

(Agent) Zombie Processes

Number of zombie processes of the monitored object

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

proc_blocked_count

(Agent) Blocked Processes

Number of blocked processes of the monitored object

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

proc_sleeping_count

(Agent) Sleeping Processes

Number of sleeping processes of the monitored object

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows is not supported currently.

≥0

Count

N/A

instance_id

1 minute

proc_total_count

(Agent) Total Processes

Total number of processes of the monitored object

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows: Obtain the metric value using psapi.dll, the Windows process status API library.

≥0

Count

N/A

instance_id

1 minute

proc_specified_count

(Agent) Specified Processes

Number of specified processes

  • Linux: Obtain the state of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.
  • Windows: Obtain the metric value using psapi.dll, the Windows process status API library.

≥0

N/A

N/A

instance_id,proc

1 minute

OS Metrics: GPU

If a server has eight GPUs and the PM mode is disabled, data may fail to be collected. You can enable the PM mode and restart the monitoring process to fix it.

Table 10 GPU metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

gpu_status

(Agent) GPU Health Status

GPU health status. It is a composite metric.

Possible causes: 1. The ECC exceeds the threshold. 2. The GPU memory address failed to be remapped. 3. GPU shows rev ff error. 4. infoROM error occurs. 5. There are pages to be isolated. 6. remapped rows error occurs. For details, see the detailed metrics below.

  • Linux: Obtain the metric value by calling the API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the API provided by the GPU driver library nvml.dll.
  • 0: healthy
  • 1: subhealthy
  • 2: faulty

N/A

N/A

instance_id,gpu

1 minute

gpu_performance_state

(Agent) Performance Status

GPU performance status

  • Linux: Obtain the metric value by calling the NvmlDeviceGetPerformanceState API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetPerformanceState API provided by the GPU driver library nvml.dll.

P0–P15, P32

  • P0: the maximum performance
  • P15: the minimum performance
  • P32: unknown performance status

N/A

N/A

instance_id,gpu

1 minute

gpu_power_draw

(Agent) GPU Draw Power

Draw power on the GPU. If the power exceeds the maximum or is an incorrect value, the GPU hardware may be faulty.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetPowerUsage API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetPowerUsage API provided by the GPU driver library nvml.dll.

≥ 0

W

N/A

instance_id,gpu

1 minute

gpu_temperature

(Agent) GPU Temperature

Temperature of the GPU. If the temperature exceeds the threshold or is an incorrect value, the GPU hardware may be faulty.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetTemperature API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetTemperature API provided by the GPU driver library nvml.dll.

≥ 0

°C

N/A

instance_id,gpu

1 minute

gpu_usage_gpu

(Agent) GPU Usage

GPU compute usage. It is an instantaneous value at a sampling point.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library nvml.dll.

0-100

%

N/A

instance_id,gpu

1 minute

gpu_usage_mem

(Agent) GPU Memory Usage

GPU memory usage. It is an instantaneous value at a sampling point.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetUtilizationRates API provided by the GPU driver library nvml.dll.

0-100

%

N/A

  • instance_id,gpu
  • instance_id,gpu_slot,pid_for_gpu

1 minute

gpu_used_mem

(Agent) GPU Used Memory

Memory used on the GPU

  • Linux: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library nvml.dll.

≥ 0

MB

N/A

  • instance_id,gpu
  • instance_id,gpu_slot,pid_for_gpu

1 minute

gpu_free_mem

(Agent) Remaining GPU Memory

Idle GPU memory

  • Linux: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetMemoryInfo API provided by the GPU driver library nvml.dll.

≥ 0

MB

N/A

instance_id,gpu

1 minute

gpu_usage_encoder

(Agent) Encoding Usage

Encoder usage of the GPU. It is an instantaneous value at a sampling point.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetEncoderUtilization API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetEncoderUtilization API provided by the GPU driver library nvml.dll.

0-100

%

N/A

  • instance_id,gpu
  • instance_id,gpu_slot,pid_for_gpu

1 minute

gpu_usage_decoder

(Agent) Decoding Usage

Decoder usage of the GPU. It is an instantaneous value at a sampling point.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetDecoderUtilization API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetDecoderUtilization API provided by the GPU driver library nvml.dll.

0-100

%

N/A

  • instance_id,gpu
  • instance_id,gpu_slot,pid_for_gpu

1 minute

gpu_graphics_clocks

(Agent) GPU Graphics Clocks

GPU graphics (shader) clock frequency. The value is the GPU clock frequency related to graphics performance. If graphics capabilities are used, you can ignore this metric.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.

≥ 0

MHz

N/A

instance_id,gpu

1 minute

gpu_sm_clocks

(Agent) GPU SM Clocks

SM clocks on the GPU. The value is the clock frequency for controlling the GPU memory speed.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.

≥ 0

MHz

N/A

instance_id,gpu

1 minute

gpu_mem_clock

(Agent) GPU Memory Clocks

Memory clocks on the GPU. The value is the clock frequency closely related to CUDA core computing of the GPU.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.

≥ 0

MHz

N/A

instance_id,gpu

1 minute

gpu_video_clocks

(Agent) GPU Video Clocks

Video clocks on the GPU. The value is the codec clock frequency of the GPU.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetClockInfo API provided by the GPU driver library nvml.dll.

≥ 0

MHz

N/A

instance_id,gpu

1 minute

gpu_tx_throughput_pci

(Agent) GPU PCI Tx Throughput

PCI Tx throughput on the GPU. The value is the amount of data sent by the GPU to the host via PCIe.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library nvml.dll.

≥ 0

MByte/s

N/A

instance_id,gpu

1 minute

gpu_rx_throughput_pci

(Agent) GPU PCI Rx Throughput

PCI Rx throughput on the GPU. The value is the amount of data sent by the host to the GPU via PCIe.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library nvml.dll.

≥ 0

MByte/s

N/A

instance_id,gpu

1 minute

gpu_volatile_correctable

(Agent) Volatile Correctable ECC Errors

Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetPcieThroughput API provided by the GPU driver library nvml.dll.

≥ 0

Count

N/A

instance_id,gpu

1 minute

gpu_volatile_uncorrectable

(Agent) Volatile Uncorrectable ECC Errors

Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.

  • Linux: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library nvml.dll.

≥ 0

Count

N/A

instance_id,gpu

1 minute

gpu_aggregate_correctable

(Agent) Aggregate Correctable ECC Errors

Aggregate correctable ECC errors on the GPU

  • Linux: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library nvml.dll.

≥ 0

Count

N/A

instance_id,gpu

1 minute

gpu_aggregate_uncorrectable

(Agent) Aggregate Uncorrectable ECC Errors

Aggregate uncorrectable ECC errors on the GPU

  • Linux: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetTotalEccErrors or NvmlDeviceGetMemoryErrorCounter API provided by the GPU driver library nvml.dll.

≥ 0

Count

N/A

instance_id,gpu

1 minute

gpu_retired_page_single_bit

(Agent) Retired Page Single Bit Errors

Number of retired page single bit errors, which indicates the number of single-bit error pages blocked by the GPU

  • Linux: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library nvml.dll.

≥ 0

Count

N/A

instance_id,gpu

1 minute

gpu_retired_page_double_bit

(Agent) Retired Page Double Bit Errors

Number of retired page double bit errors, which indicates the number of double-bit error pages blocked by the GPU

  • Linux: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows: Obtain the metric value by calling the NvmlDeviceGetRetiredPages API provided by the GPU driver library nvml.dll.

≥ 0

Count

N/A

instance_id,gpu

1 minute

gpu_lnkcap_speed

(Agent) Max. GPU Link Speed

Maximum PCIe link speed of the GPU, which means the maximum data throughput of the GPU on the PCIe bus

  • Linux: Obtain the metric value by running lspci -d 10de: -vv | grep -i lnkcap.
  • Windows: Obtain the metric value by running (gwmi Win32_Bus -Filter 'DeviceID like "PCI%"').GetRelated('Win32_PnPEntity').

≥ 0

GT/s

N/A

instance_id,gpu

1 minute

gpu_lnkcap_width

(Agent) Max. GPU Link Width

Maximum PCIe link width of the GPU, which means the maximum number of PCIe lanes supported by the GPU

  • Linux: Obtain the metric value by running lspci -d 10de: -vv | grep -i lnkcap.
  • Windows: Obtain the metric value by running (gwmi Win32_Bus -Filter 'DeviceID like "PCI%"').GetRelated('Win32_PnPEntity').

≥ 0

count

N/A

instance_id,gpu

1 minute

gpu_lnksta_speed

(Agent) GPU Link Speed

PCIe link speed of the GPU

  • Linux: Obtain the metric value by running lspci -d 10de: -vv | grep -i lnksta.
  • Windows is not supported currently.

≥ 0

GT/s

N/A

instance_id,gpu

1 minute

gpu_lnksta_width

(Agent) GPU Link Width

PCIe link width of the GPU, which means the number of PCIe lanes of the GPU

  • Linux: Obtain the metric value by running lspci -d 10de: -vv | grep -i lnksta.
  • Windows is not supported currently.

≥ 0

count

N/A

instance_id,gpu

1 minute

gpu_nvlink_number

(Agent) GPU NVLinks

Number of NVLinks of the GPU. For example, A100 supports 12 NVLinks.

  • Linux: Obtain the metric value by calling the nvmlDeviceGetFieldValue API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows is not supported currently.

≥ 0

count

N/A

instance_id,gpu

1 minute

gpu_nvlink_bandwidth

(Agent) Average GPU NVLink Bandwidth

Average NVLink bandwidth of the GPU

The value is the total bandwidth for GPU data transmission.

  • Linux: Obtain the metric value by calling the nvmlDeviceGetFieldValue API provided by the GPU driver library libnvidia-ml.so.1.
  • Windows is not supported currently.

≥ 0

GB/s

N/A

instance_id,gpu

1 minute

OS Metrics: NPU

Table 11 NPU metrics

Metric ID

Metric Name

Description

Value Range

Unit

Conversion Rule

Dimension

Monitoring Interval (Raw Data)

npu_device_health

(Agent) NPU Device Health

NPU health status

  • 0: healthy
  • 1: minor alarm
  • 2: major alarm
  • 3: critical alarm

N/A

N/A

instance_id,npu

1 minute

npu_driver_health

(Agent) NPU Driver Health

Health status of the NPU driver

  • 0: healthy
  • 1: minor alarm
  • 2: major alarm
  • 3: critical alarm

N/A

N/A

instance_id,npu

1 minute

npu_power

(Agent) NPU Power

NPU power

>0

W

N/A

instance_id,npu

1 minute

npu_temperature

(Agent) NPU Temperature

NPU temperature

Natural numbers

°C

N/A

instance_id,npu

1 minute

npu_voltage

(Agent) NPU Voltage

NPU voltage

Natural numbers

V

N/A

instance_id,npu

1 minute

npu_util_rate_hbm

(Agent) NPU HBM Usage

NPU HBM usage

0-100

%

N/A

instance_id,npu

1 minute

npu_hbm_freq

(Agent) NPU HBM Frequency

NPU HBM frequency

>0

MHz

N/A

instance_id,npu

1 minute

npu_freq_hbm

(Agent) NPU HBM Frequency

NPU HBM frequency

>0

MHz

N/A

instance_id,npu

1 minute

npu_hbm_usage

(Agent) Used HBM

Used NPU HBM

≥0

MB

N/A

instance_id,npu

1 minute

npu_hbm_temperature

(Agent) HBM Temperature

NPU HBM temperature

Natural numbers

°C

N/A

instance_id,npu

1 minute

npu_hbm_bandwidth_util

(Agent) HBM Bandwidth Usage

NPU HBM bandwidth usage

0-100

%

N/A

instance_id,npu

1 minute

npu_hbm_mem_capacity

(Agent) HBM Memory Capacity

NPU HBM memory capacity

≥0

MB

N/A

instance_id,npu

1 minute

npu_hbm_ecc_enable

(Agent) HBM ECC Check Status

Whether HBM ECC check is enabled for the NPU

  • 0: disabled
  • 1: enabled

N/A

N/A

instance_id,npu

1 minute

npu_hbm_single_bit_error_cnt

(Agent) HBM Single-Bit Errors

Number of HBM single-bit errors of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_hbm_double_bit_error_cnt

(Agent) HBM Double-Bit Errors

Number of HBM double-bit errors of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_hbm_total_single_bit_error_cnt

(Agent) Single-Bit Errors in HBM Lifecycle

Number of single-bit errors in an NPU HBM lifecycle

≥0

count

N/A

instance_id,npu

1 minute

npu_hbm_total_double_bit_error_cnt

(Agent) Double-Bit Errors in HBM Lifecycle

Number of double-bit errors in an NPU HBM lifecycle

≥0

count

N/A

instance_id,npu

1 minute

npu_hbm_single_bit_isolated_pages_cnt

(Agent) Isolated Memory Pages with HBM Single-Bit Errors

Number of isolated memory pages with single-bit HBM errors of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_hbm_double_bit_isolated_pages_cnt

(Agent) Isolated Memory Pages with HBM Double-Bit Errors

Number of isolated memory pages with double-bit HBM errors of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_usage_mem

(Agent) Used NPU Memory

Memory used on the NPU

≥0

MB

N/A

instance_id,npu

1 minute

npu_util_rate_mem

(Agent) NPU Memory Usage

NPU memory usage

0-100

%

N/A

instance_id,npu

1 minute

npu_util_rate_hbm_bw

(Agent) NPU HBM Bandwidth Usage

NPU HBM bandwidth usage

0-100

%

N/A

instance_id,npu

1 minute

npu_freq_mem

(Agent) NPU Memory Frequency

NPU memory frequency

>0

MHz

N/A

instance_id,npu

1 minute

npu_util_rate_mem_bandwidth

(Agent) NPU Memory Bandwidth Usage

NPU memory bandwidth usage

0-100

%

N/A

instance_id,npu

1 minute

npu_util_rate_vector_core

(Agent) NPU Vector Core Usage

Vector core usage of the NPU

0-100

%

N/A

instance_id,npu

1 minute

npu_sbe

(Agent) NPU Single-Bit Errors

Number of single-bit errors on the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_dbe

(Agent) NPU Double-Bit Errors

Number of dual-bit errors on the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_freq_ai_core

(Agent) NPU AI Core Frequency

AI core clock frequency of the NPU

>0

MHz

N/A

instance_id,npu

1 minute

npu_freq_ai_core_rated

(Agent) Rated NPU AI Core Frequency

Rated AI core frequency of the NPU

>0

MHz

N/A

instance_id,npu

1 minute

npu_util_rate_ai_core

(Agent) NPU AI Core Usage

AI core usage of the NPU

0-100

%

N/A

instance_id,npu

1 minute

npu_aicpu_num

(Agent) NPU AI CPUs

Number of AI CPUs on the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_util_rate_ai_cpu

(Agent) NPU AI CPU Usage

AI CPU usage of the NPU

0-100

%

N/A

instance_id,npu

1 minute

npu_aicpu_avg_util_rate

(Agent) Average AI CPU Usage of NPU

Average AI CPU usage of the NPU

0-100

%

N/A

instance_id,npu

1 minute

npu_aicpu_max_freq

(Agent) Max. AI CPU Frequency of NPU

Maximum AI CPU frequency of the NPU

>0

MHz

N/A

instance_id,npu

1 minute

npu_aicpu_cur_freq

(Agent) AI CPU Frequency of NPU

AI CPU frequency of the NPU

>0

MHz

N/A

instance_id,npu

1 minute

npu_util_rate_ctrl_cpu

(Agent) NPU Control CPU Usage

CPU usage controlled by the NPU

0-100

%

N/A

instance_id,npu

1 minute

npu_freq_ctrl_cpu

(Agent) NPU Control CPU Frequency

CPU frequency controlled by the NPU

>0

MHz

N/A

instance_id,npu

1 minute

npu_link_cap_speed

(Agent) Max. NPU Link Speed

Maximum link speed of the NPU

≥0

GT/s

N/A

instance_id,npu

1 minute

npu_link_cap_width

(Agent) Max. NPU Link Width

Maximum link width of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_link_status_speed

(Agent) NPU Link Speed

Link speed of the NPU

≥0

GT/s

N/A

instance_id,npu

1 minute

npu_link_status_width

(Agent) NPU Link Width

Link width of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_device_network_health

(Agent) NPU Network Health

RoCE IP address connectivity of the NPU

  • 0: The network is healthy.
  • Other values: The network status is unhealthy.

N/A

N/A

instance_id,npu

1 minute

npu_network_port_link_status

(Agent) NPU Network Port Link Status

Link status of the network port on the NPU

  • 0: up
  • 1: down

N/A

N/A

instance_id,npu

1 minute

npu_roce_tx_rate

(Agent) NPU NIC Uplink Rate

NIC uplink rate of the NPU

≥0

MB/s

N/A

instance_id,npu

1 minute

npu_roce_rx_rate

(Agent) NPU NIC Downlink Rate

NIC downlink rate of the NPU

≥0

MB/s

N/A

instance_id,npu

1 minute

npu_mac_tx_mac_pause_num

(Agent) Pause Frames Sent by MAC

Total number of pause frames sent by the MAC address of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_mac_rx_mac_pause_num

(Agent) Pause Frames Received by MAC

Total number of pause frames received by the MAC address of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_mac_tx_pfc_pkt_num

(Agent) PFC Frames Sent by MAC

Total number of PFC frames sent by the MAC address of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_mac_rx_pfc_pkt_num

(Agent) PFC Frames Received by MAC

Total number of PFC frames received by the MAC address of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_mac_tx_bad_pkt_num

(Agent) Bad Packets Sent by MAC

Total number of bad packets sent by the MAC address of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_mac_rx_bad_pkt_num

(Agent) Bad Packets Received by MAC

Total number of bad packets received by the MAC address of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_tx_err_pkt_num

(Agent) Bad Packets Sent by RoCE

Total number of bad packets sent by the RoCE NIC of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_rx_err_pkt_num

(Agent) Bad Packets Received by RoCE

Total number of bad packets received by the RoCE NIC of the NPU

≥0

count

N/A

instance_id,npu

1 minute

npu_opt_temperature

(Agent) NPU Optical Module Case Temperature

Case temperature of the NPU optical module

Natural numbers

°C

N/A

instance_id,npu

1 minute

npu_opt_temperature_high_thres

(Agent) Max. NPU Optical Module Case Temperature

Upper limit for the case temperature of the NPU optical module

Natural numbers

°C

N/A

instance_id,npu

1 minute

npu_opt_temperature_low_thres

(Agent) Min. NPU Optical Module Case Temperature

Lower limit for the case temperature of the NPU optical module

Natural numbers

°C

N/A

instance_id,npu

1 minute

npu_opt_voltage

(Agent) NPU Optical Module Voltage

Voltage of the NPU optical module

Natural numbers

mV

N/A

instance_id,npu

1 minute

npu_opt_voltage_high_thres

(Agent) Max. NPU Optical Module Voltage

Upper limit for the voltage of the NPU optical module

Natural numbers

mV

N/A

instance_id,npu

1 minute

npu_opt_voltage_low_thres

(Agent) Min. NPU Optical Module Voltage

Lower limit for the voltage of the NPU optical module

Natural numbers

mV

N/A

instance_id,npu

1 minute

npu_opt_tx_power_lane0

(Agent) NPU Optical Module Lane 0 TX Power

Transmit power of NPU optical module lane 0

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_tx_power_lane1

(Agent) NPU Optical Module Lane 1 TX Power

Transmit power of NPU optical module lane 1

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_tx_power_lane2

(Agent) NPU Optical Module Lane 2 TX Power

Transmit power of NPU optical module lane 2

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_tx_power_lane3

(Agent) NPU Optical Module Lane 3 TX Power

Transmit power of NPU optical module lane 3

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_rx_power_lane0

(Agent) NPU Optical Module Lane 0 RX Power

Receive power of NPU optical module lane 0

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_rx_power_lane1

(Agent) NPU Optical Module Lane 1 RX Power

Receive power of NPU optical module lane 1

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_rx_power_lane2

(Agent) NPU Optical Module Lane 2 RX Power

Receive power of NPU optical module lane 2

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_rx_power_lane3

(Agent) NPU Optical Module Lane 3 RX Power

Receive power of NPU optical module lane 3

≥0

mW

N/A

instance_id,npu

1 minute

npu_opt_tx_bias_lane0

(Agent) NPU Optical Module Lane 0 TX Bias Current

Transmit bias current of NPU optical module lane 0

≥0

mA

N/A

instance_id,npu

1 minute

npu_opt_tx_bias_lane1

(Agent) NPU Optical Module Lane 1 TX Bias Current

Transmit bias current of NPU optical module lane 1

≥0

mA

N/A

instance_id,npu

1 minute

npu_opt_tx_bias_lane2

(Agent) NPU Optical Module Lane 2 TX Bias Current

Transmit bias current of NPU optical module lane 2

≥0

mA

N/A

instance_id,npu

1 minute

npu_opt_tx_bias_lane3

(Agent) NPU Optical Module Lane 3 TX Bias Current

Transmit bias current of NPU optical module lane 3

≥0

mA

N/A

instance_id,npu

1 minute

npu_opt_tx_los

(Agent) NPU Optical Module TX LOS

Statistics on Transmit LOS Flag of the NPU optical module

≥0

count

N/A

instance_id,npu

1 minute

npu_opt_rx_los

(Agent) NPU Optical Module RX LOS

Statistics on Receive LOS Flag of the NPU optical module

≥0

count

N/A

instance_id,npu

1 minute

npu_macro1_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro1 0lane

Maximum duration of NPU Macro1 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro1_0lane_total_sec

(Agent) Total Duration of NPU Macro1 0lane

Total duration of NPU Macro1 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro1_crc_error_cnt

(Agent) Error Packets Received by NPU Macro1

Number of CRC error packets received by NPU Macro1 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro1_crc_error_rate

(Agent) NPU Macro1 BER

Percentage of CRC error packets received by NPU Macro1 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro1_retry_cnt

(Agent) Packets Retransmitted by NPU Macro1

Number of packets retransmitted by NPU Macro1 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro1_rx_cnt

(Agent) Packets Received by NPU Macro1

Number of packets received by NPU Macro1 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro1_serdes_lane0_snr

(Agent) NPU Macro1 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro1_serdes_lane1_snr

(Agent) NPU Macro1 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro1_serdes_lane2_snr

(Agent) NPU Macro1 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro1_serdes_lane3_snr

(Agent) NPU Macro1 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro1_tx_cnt

(Agent) Packets Sent by NPU Macro1

Number of packets sent by NPU Macro1 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro2_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro2 0lane

Maximum duration of NPU Macro2 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro2_0lane_total_sec

(Agent) Total Duration of NPU Macro2 0lane

Total duration of NPU Macro2 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro2_crc_error_cnt

(Agent) Error Packets Received by NPU Macro2

Number of CRC error packets received by NPU Macro2 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro2_crc_error_rate

(Agent) NPU Macro2 BER

Percentage of CRC error packets received by NPU Macro2 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro2_retry_cnt

(Agent) Packets Retransmitted by NPU Macro2

Number of packets retransmitted by NPU Macro2 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro2_rx_cnt

(Agent) Packets Received by NPU Macro2

Number of packets received by NPU Macro2 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro2_serdes_lane0_snr

(Agent) NPU Macro2 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro2_serdes_lane1_snr

(Agent) NPU Macro2 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro2_serdes_lane2_snr

(Agent) NPU Macro2 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro2_serdes_lane3_snr

(Agent) NPU Macro2 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro2_tx_cnt

(Agent) Packets Sent by NPU Macro2

Number of packets sent by NPU Macro2 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro3_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro3 0lane

Maximum duration of NPU Macro3 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro3_0lane_total_sec

(Agent) Total Duration of NPU Macro3 0lane

Total duration of NPU Macro3 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro3_crc_error_cnt

(Agent) Error Packets Received by NPU Macro3

Number of CRC error packets received by NPU Macro3 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro3_crc_error_rate

(Agent) NPU Macro3 BER

Percentage of CRC error packets received by NPU Macro3 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro3_retry_cnt

(Agent) Packets Retransmitted by NPU Macro3

Number of packets retransmitted by NPU Macro3 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro3_rx_cnt

(Agent) Packets Received by NPU Macro3

Number of packets received by NPU Macro3 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro3_serdes_lane0_snr

(Agent) NPU Macro3 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro3_serdes_lane1_snr

(Agent) NPU Macro3 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro3_serdes_lane2_snr

(Agent) NPU Macro3 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro3_serdes_lane3_snr

(Agent) NPU Macro3 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro3_tx_cnt

(Agent) Packets Sent by NPU Macro3

Number of packets sent by NPU Macro3 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro4_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro4 0lane

Maximum duration of NPU Macro4 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro4_0lane_total_sec

(Agent) Total Duration of NPU Macro4 0lane

Total duration of NPU Macro4 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro4_crc_error_cnt

(Agent) Error Packets Received by NPU Macro4

Number of CRC error packets received by NPU Macro4 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro4_crc_error_rate

(Agent) NPU Macro4 BER

Percentage of CRC error packets received by NPU Macro4 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro4_retry_cnt

(Agent) Packets Retransmitted by NPU Macro4

Number of packets retransmitted by NPU Macro4 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro4_rx_cnt

(Agent) Packets Received by NPU Macro4

Number of packets received by NPU Macro4 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro4_serdes_lane0_snr

(Agent) NPU Macro4 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro4_serdes_lane1_snr

(Agent) NPU Macro4 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro4_serdes_lane2_snr

(Agent) NPU Macro4 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro4_serdes_lane3_snr

(Agent) NPU Macro4 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro4_tx_cnt

(Agent) Packets Sent by NPU Macro4

Number of packets sent by NPU Macro4 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro5_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro5 0lane

Maximum duration of NPU Macro5 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro5_0lane_total_sec

(Agent) Total Duration of NPU Macro5 0lane

Total duration of NPU Macro5 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro5_crc_error_cnt

(Agent) Error Packets Received by NPU Macro5

Number of CRC error packets received by NPU Macro5 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro5_crc_error_rate

(Agent) NPU Macro5 BER

Percentage of CRC error packets received by NPU Macro5 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro5_retry_cnt

(Agent) Packets Retransmitted by NPU Macro5

Number of packets retransmitted by NPU Macro5 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro5_rx_cnt

(Agent) Packets Received by NPU Macro5

Number of packets received by NPU Macro5 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro5_serdes_lane0_snr

(Agent) NPU Macro5 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro5_serdes_lane1_snr

(Agent) NPU Macro5 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro5_serdes_lane2_snr

(Agent) NPU Macro5 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro5_serdes_lane3_snr

(Agent) NPU Macro5 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro5_tx_cnt

(Agent) Packets Sent by NPU Macro5

Number of packets sent by NPU Macro5 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro6_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro6 0lane

Maximum duration of NPU Macro6 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro6_0lane_total_sec

(Agent) Total Duration of NPU Macro6 0lane

Total duration of NPU Macro6 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro6_crc_error_cnt

(Agent) Error Packets Received by NPU Macro6

Number of CRC error packets received by NPU Macro6 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro6_crc_error_rate

(Agent) NPU Macro6 BER

Percentage of CRC error packets received by NPU Macro6 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro6_retry_cnt

(Agent) Packets Retransmitted by NPU Macro6

Number of packets retransmitted by NPU Macro6 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro6_rx_cnt

(Agent) Packets Received by NPU Macro6

Number of packets received by NPU Macro6 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro6_serdes_lane0_snr

(Agent) NPU Macro6 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro6_serdes_lane1_snr

(Agent) NPU Macro6 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro6_serdes_lane2_snr

(Agent) NPU Macro6 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro6_serdes_lane3_snr

(Agent) NPU Macro6 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro6_tx_cnt

(Agent) Packets Sent by NPU Macro6

Number of packets sent by NPU Macro6 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro7_0lane_max_consec_sec

(Agent) Max. Duration of NPU Macro7 0lane

Maximum duration of NPU Macro7 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro7_0lane_total_sec

(Agent) Total Duration of NPU Macro7 0lane

Total duration of NPU Macro7 0lane in a monitoring period

≥0

s

N/A

instance_id,npu

1 minute

npu_macro7_crc_error_cnt

(Agent) Error Packets Received by NPU Macro7

Number of CRC error packets received by NPU Macro7 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro7_crc_error_rate

(Agent) NPU Macro7 BER

Percentage of CRC error packets received by NPU Macro7 in a monitoring period

0-100

%

N/A

instance_id,npu

1 minute

npu_macro7_retry_cnt

(Agent) Packets Retransmitted by NPU Macro7

Number of packets retransmitted by NPU Macro7 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro7_rx_cnt

(Agent) Packets Received by NPU Macro7

Number of packets received by NPU Macro7 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_macro7_serdes_lane0_snr

(Agent) NPU Macro7 SerDes Lane0 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane0

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro7_serdes_lane1_snr

(Agent) NPU Macro7 SerDes Lane1 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane1

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro7_serdes_lane2_snr

(Agent) NPU Macro7 SerDes Lane2 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane2

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro7_serdes_lane3_snr

(Agent) NPU Macro7 SerDes Lane3 SNR

Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane3

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_macro7_tx_cnt

(Agent) Packets Sent by NPU Macro7

Number of packets sent by NPU Macro7 in a monitoring period

≥0

count

N/A

instance_id,npu

1 minute

npu_opt_media_snr_lane0

(Agent) NPU Optical Module Lane 0 Optical SNR

Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 0 in the NPU optical module

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_opt_media_snr_lane1

(Agent) NPU Optical Module Lane 1 Optical SNR

Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 1 in the NPU optical module

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_opt_media_snr_lane2

(Agent) NPU Optical Module Lane 2 Optical SNR

Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 2 in the NPU optical module

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_opt_media_snr_lane3

(Agent) NPU Optical Module Lane 3 Optical SNR

Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 3 in the NPU optical module

Natural numbers

db

N/A

instance_id,npu

1 minute

npu_roce_new_pkt_rty_num

(Agent) Packets Retransmitted by NPU RoCE

Number of packets retransmitted by NPU RoCE

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_out_of_order_num

(Agent) PSN Error Packets Received by NPU RoCE

Number of NPU RoCE packets with a PSN greater than the expected one or duplicating with an existing one If packets are out of order or lost, retransmission will be triggered.

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_rx_all_pkt_num

(Agent) Packets Received by NPU RoCE

Total number of packets received by NPU RoCE

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_rx_cnp_pkt_num

(Agent) CNP Packets Received by NPU RoCE

Total number of CNP packets received by NPU RoCE

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_tx_all_pkt_num

(Agent) Packets Sent by NPU RoCE

Total number of packets sent by NPU RoCE

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_tx_cnp_pkt_num

(Agent) CNP Packets Sent by NPU RoCE

Total number of CNP packets sent by NPU RoCE

≥0

count

N/A

instance_id,npu

1 minute

npu_roce_tx_err_pkt_num

(Agent) Bad Packets Sent by RoCE

Total number of bad packets sent by the RoCE NIC of the NPU for reference

≥0

count

N/A

instance_id,npu

1 minute

If an object is in a hierarchical system, specify the monitored dimension in hierarchical form when you use APIs to query metrics of this object.

For example, to query the available space (metric: disk_free) of a disk mount point on a BMS, the dimension of the metric is instance_id,mount_point, where instance_id indicates level 0 and mount_point indicates level 1.

  • To query a single metric by calling an API, the mount_point dimension is used as follows:
    dim.0=instance_id,3d65c1ac-9a9f-4c5f-a054-35184a087bb2&dim.1=mount_point,6666cd76f96956469e7be39d750cc7d9

    3d65c1ac-9a9f-4c5f-a054-35184a087bb2 and 6666cd76f96956469e7be39d750cc7d9 are the values of instance_id and mount_point, respectively. For details about how to obtain the values, see Dimensions.

  • To query multiple metrics by calling an API, the mount_point dimension is used as follows:
    "dimensions": [ 
                     { 
                         "name": "instance_id", 
                         "value": "3d65c1ac-9a9f-4c5f-a054-35184a087bb2"    
                     }, 
                     { 
                         "name": "mount_point", 
                         "value": "6666cd76f96956469e7be39d750cc7d9" 
                     } 
                 ]

    3d65c1ac-9a9f-4c5f-a054-35184a087bb2 and 6666cd76f96956469e7be39d750cc7d9 are the values of instance_id and mount_point, respectively. For details about how to obtain the values, see Dimensions.

Dimensions

Dimension

Key

Value

Cloud server

instance_id

Cloud server

Server process

proc

Process

Cloud server disk

disk

Disk

Cloud server mount point

mount_point

Mount point

Cloud server GPU

gpu

GPU

Cloud server NPU

npu

NPU

Cloud server NIC

network_interface_card

NIC

Cloud server GPU

gpu_slot

GPU

GPU process ID of cloud server

pid_for_gpu

GPU process ID