Monitored Metrics (with Agent Installed)
Description
This section describes monitoring metrics reported by BMS to Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for BMS.
Cloud Eye can monitor dimensions nested to a maximum depth of four levels (levels 0 to 3). 3 is the deepest level. For example, if the monitored dimension of a metric is instance_id,mount_point, instance_id indicates level 0 and mount_point indicates level 1.
Prerequisites
The Agent has been installed. For details, see Installing the Agent.
Namespace
SERVICE.BMS
OS Metrics: CPU
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
cpu_usage |
(Agent) CPU Usage |
CPU usage of the monitored object
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_idle |
(Agent) Idle CPU Usage |
Percentage of time that CPU is idle
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_other |
(Agent) Other Process CPU Usage |
Percentage of time that the CPU is used by other processes
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_system |
(Agent) Kernel Space CPU Usage |
Percentage of time that the CPU is used by kernel space
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_user |
(Agent) User Space CPU Usage |
Percentage of time that the CPU is used by user space
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_nice |
(Agent) Nice Process CPU Usage |
Percentage of time that the CPU is used by the Nice process
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_iowait |
(Agent) iowait Process CPU Usage |
Percentage of time during which the CPU is waiting for I/O operations to complete
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_irq |
(Agent) CPU Interrupt Time |
Percentage of time that the CPU is servicing interrupts
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
cpu_usage_softirq |
(Agent) CPU Software Interrupt Time |
Percentage of time that the CPU is servicing software interrupts
|
0-100 |
% |
N/A |
instance_id |
1 minute |
OS Metrics: CPU Load
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
load_average1 |
(Agent) 1-Minute Load Average |
CPU load averaged from the last 1 minute Linux: Obtain the metric value from the number of logic CPUs in load1/ in file /proc/loadavg. Run the top command and check the load1 value. |
≥0 |
N/A |
N/A |
instance_id |
1 minute |
|
load_average5 |
(Agent) 5-Minute Load Average |
CPU load averaged from the last 5 minutes Linux: Obtain the metric value from the number of logic CPUs in load5/ in file /proc/loadavg. Run the top command and check the load5 value. |
≥0 |
N/A |
N/A |
instance_id |
1 minute |
|
load_average15 |
(Agent) 15-Minute Load Average |
CPU load averaged from the last 15 minutes Linux: Obtain the metric value from the number of logic CPUs in load15/ in file /proc/loadavg. Run the top command and check the load15 value. |
≥0 |
N/A |
N/A |
instance_id |
1 minute |
OS Metrics: Memory
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
mem_available |
(Agent) Available Memory |
Available memory of the monitored object
|
≥0 |
GB |
N/A |
instance_id |
1 minute |
|
mem_usedPercent |
(Agent) Memory Usage |
Memory usage of the monitored object
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
mem_free |
(Agent) Idle Memory |
Memory that is not being used
|
≥0 |
GB |
N/A |
instance_id |
1 minute |
|
mem_buffers |
(Agent) Buffer |
Memory that is being used for buffers
|
≥0 |
GB |
N/A |
instance_id |
1 minute |
|
mem_cached |
(Agent) Cache |
Memory that is being used for caches
|
≥0 |
GB |
N/A |
instance_id |
1 minute |
|
total_open_files |
(Agent) Total File Handles |
Total handles used by all processes
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
OS Metrics: Disk
- Currently, Cloud Eye Agent only monitors physical disks. NFS-mounted disks cannot be monitored.
- By default, Cloud Eye Agent excludes Docker-related mount points. The mount point prefixes are as follows:
/var/lib/docker;/mnt/paas/kubernetes;/var/lib/mesos
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
disk_free |
(Agent) Available Disk Space |
Available disk space of the monitored object
|
≥0 |
GB |
N/A |
instance_id,mount_point |
1 minute |
|
disk_total |
(Agent) Disk Storage Capacity |
Total disk capacity of the monitored object
|
≥0 |
GB |
N/A |
instance_id,mount_point |
1 minute |
|
disk_used |
(Agent) Used Disk Space |
Used disk space of the monitored object
|
≥0 |
GB |
N/A |
instance_id,mount_point |
1 minute |
|
disk_usedPercent |
(Agent) Disk Usage |
Disk usage of the monitored object Formula: Disk Usage = Used Disk Space/Disk Storage Capacity
|
0-100 |
% |
N/A |
instance_id,mount_point |
1 minute |
OS Metrics: Disk I/O
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
disk_agt_read_bytes_rate |
(Agent) Disks Read Rate |
Number of bytes read from the monitored disk per second
|
≥ 0 |
byte/s |
1024(IEC) |
|
1 minute |
|
disk_agt_read_requests_rate |
(Agent) Disks Read Requests |
Number of requests to read data from the monitored disk per second
|
≥ 0 |
request/s |
N/A |
|
1 minute |
|
disk_agt_write_bytes_rate |
(Agent) Disks Write Rate |
Number of bytes written into the monitored disk per second
|
≥ 0 |
byte/s |
1024(IEC) |
|
1 minute |
|
disk_agt_write_requests_rate |
(Agent) Disks Write Requests |
Number of requests to write data into the monitored disk per second
|
≥ 0 |
request/s |
N/A |
|
1 minute |
|
disk_readTime |
(Agent) Average Read Request Time |
Average amount of time that read requests have waited on the disks
|
≥ 0 |
ms/count |
N/A |
|
1 minute |
|
disk_writeTime |
(Agent) Average Write Request Time |
Average amount of time that write requests have waited on the disks
|
≥ 0 |
ms/count |
N/A |
|
1 minute |
|
disk_ioUtils |
(Agent) Disk I/O Usage |
Disk I/O usage of the monitored object
|
0-100 |
% |
N/A |
|
1 minute |
|
disk_queue_length |
(Agent) Disk Queue Length |
Average number of read or write requests waiting to be processed by the monitored disk in a monitoring period
|
≥ 0 |
Count |
N/A |
|
1 minute |
|
disk_write_bytes_per_operation |
(Agent) Average Disk Write Size |
Average number of bytes written into the monitored disk per write I/O in a monitoring period
|
≥ 0 |
Byte/op |
N/A |
|
1 minute |
|
disk_read_bytes_per_operation |
(Agent) Average Disk Read Size |
Average number of bytes read from the monitored disk per read I/O in a monitoring period
|
≥ 0 |
Byte/op |
N/A |
|
1 minute |
|
disk_io_svctm |
(Agent) Disk I/O Service Time |
Average time the monitored disk takes to complete an I/O request (read or write) in a monitoring period
|
≥ 0 |
ms/op |
N/A |
|
1 minute |
|
disk_device_used_percent |
(Agent) Block Device Usage |
Physical disk usage of the monitored object Formula: Block device usage = Storage space used by all mounted disk partitions/Total disk storage space
|
0-100 |
% |
N/A |
|
1 minute |
OS Metrics: File System
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
disk_fs_rwstate |
(Agent) File System Read/Write Status |
File system read/write status of the monitored object Possible values are 0 (read and write) and 1 (read-only).
|
|
N/A |
N/A |
instance_id,mount_point |
1 minute |
|
disk_inodesTotal |
(Agent) Disk inode Total |
Total number of index nodes on the disk
|
≥ 0 |
N/A |
N/A |
instance_id,mount_point |
1 minute |
|
disk_inodesUsed |
(Agent) Total inode Used |
Number of used index nodes on the disk
|
≥ 0 |
N/A |
N/A |
instance_id,mount_point |
1 minute |
|
disk_inodesUsedPercent |
(Agent) Percentage of Total inode Used |
Percentage of used index nodes on the disk
|
0-100 |
% |
N/A |
instance_id,mount_point |
1 minute |
OS Metrics: TCP
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
net_tcp_total |
(Agent) Total TCP Connections |
Total number of TCP connections in all states
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_established |
(Agent) TCP ESTABLISHED Connections |
Number of TCP connections in ESTABLISHED state
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_sys_sent |
(Agent) TCP SYS_SENT Connections |
Number of TCP connections that are being requested by the client
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_sys_recv |
(Agent) TCP SYS_RECV Connections |
Number of pending TCP connections received by the server
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_fin_wait1 |
(Agent) TCP FIN_WAIT1 Connections |
Number of TCP connections waiting for ACK packets when the connections are being actively closed by the client
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_fin_wait2 |
(Agent) TCP FIN_WAIT2 Connections |
Number of TCP connections in FIN_WAIT2 state
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_time_wait |
(Agent) TCP TIME_WAIT Connections |
Number of TCP connections in TIME_WAIT state
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_close |
(Agent) TCP CLOSE Connections |
Number of closed TCP connections
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_close_wait |
(Agent) TCP CLOSE_WAIT Connections |
Number of TCP connections in CLOSE_WAIT TCP state
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_last_ack |
(Agent) TCP LAST_ACK Connections |
Number of TCP connections waiting for ACK packets when the connections are being passively closed by the client
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_listen |
(Agent) TCP LISTEN Connections |
Number of TCP connections in LISTEN state
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_closing |
(Agent) TCP CLOSING Connections |
Number of TCP connections to be actively closed by the server and the client at the same time
|
≥ 0 |
Count |
N/A |
instance_id |
1 minute |
|
net_tcp_retrans |
(Agent) TCP Retransmission Rate |
Percentage of packets that are resent
|
0-100 |
% |
N/A |
instance_id |
1 minute |
OS Metrics: NIC
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
net_bitRecv |
(Agent) Outbound Bandwidth |
Number of bits sent by the monitored object per second
|
≥ 0 |
bit/s |
1024(IEC) |
|
1 minute |
|
net_bitSent |
(Agent) Inbound Bandwidth |
Number of bits received by the monitored object per second
|
≥ 0 |
bit/s |
1024(IEC) |
|
1 minute |
|
net_packetRecv |
(Agent) NIC Packet Receive Rate |
Number of packets received by the monitored object per second
|
≥ 0 |
Count/s |
N/A |
|
1 minute |
|
net_packetSent |
(Agent) NIC Packet Send Rate |
Number of packets sent by the monitored object per second
|
≥ 0 |
Count/s |
N/A |
|
1 minute |
|
net_errin |
(Agent) Receive Error Rate |
Percentage of error packets relative to the total packets received by the monitored object per second
|
0-100 |
% |
N/A |
|
1 minute |
|
net_errout |
(Agent) Transmit Error Rate |
Percentage of error packets relative to the total packets sent by the monitored object per second
|
0-100 |
% |
N/A |
|
1 minute |
|
net_dropin |
(Agent) Received Packet Drop Rate |
Percentage of received but dropped packets relative to the total packets received by the monitored object per second
|
0-100 |
% |
N/A |
|
1 minute |
|
net_dropout |
(Agent) Transmitted Packet Drop Rate |
Percentage of sent but dropped packets relative to the total packets sent by the monitored object per second
|
0-100 |
% |
N/A |
|
1 minute |
Process Monitoring Metrics
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
proc_pHashId_cpu |
(Agent) CPU Usage |
CPU consumed by a process. pHashId is the MD5 value of the process name plus process ID.
|
0–1 x Number of CPU cores |
% |
N/A |
instance_id |
1 minute |
|
proc_pHashId_mem |
(Agent) Memory Usage |
Memory consumed by a process. pHashId is the MD5 value of the process name plus process ID.
|
0-100 |
% |
N/A |
instance_id |
1 minute |
|
proc_pHashId_file |
(Agent) Opened Files |
Number of files opened by a process. pHashId is the MD5 value of the process name plus process ID.
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_running_count |
(Agent) Running Processes |
Number of running processes of the monitored object
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_idle_count |
(Agent) Idle Processes |
Number of idle processes of the monitored object
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_zombie_count |
(Agent) Zombie Processes |
Number of zombie processes of the monitored object
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_blocked_count |
(Agent) Blocked Processes |
Number of blocked processes of the monitored object
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_sleeping_count |
(Agent) Sleeping Processes |
Number of sleeping processes of the monitored object
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_total_count |
(Agent) Total Processes |
Total number of processes of the monitored object
|
≥0 |
Count |
N/A |
instance_id |
1 minute |
|
proc_specified_count |
(Agent) Specified Processes |
Number of specified processes
|
≥0 |
N/A |
N/A |
instance_id,proc |
1 minute |
OS Metrics: GPU
If a server has eight GPUs and the PM mode is disabled, data may fail to be collected. You can enable the PM mode and restart the monitoring process to fix it.
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
gpu_status |
(Agent) GPU Health Status |
GPU health status. It is a composite metric. Possible causes: 1. The ECC exceeds the threshold. 2. The GPU memory address failed to be remapped. 3. GPU shows rev ff error. 4. infoROM error occurs. 5. There are pages to be isolated. 6. remapped rows error occurs. For details, see the detailed metrics below.
|
|
N/A |
N/A |
instance_id,gpu |
1 minute |
|
gpu_performance_state |
(Agent) Performance Status |
GPU performance status
|
P0–P15, P32
|
N/A |
N/A |
instance_id,gpu |
1 minute |
|
gpu_power_draw |
(Agent) GPU Draw Power |
Draw power on the GPU. If the power exceeds the maximum or is an incorrect value, the GPU hardware may be faulty.
|
≥ 0 |
W |
N/A |
instance_id,gpu |
1 minute |
|
gpu_temperature |
(Agent) GPU Temperature |
Temperature of the GPU. If the temperature exceeds the threshold or is an incorrect value, the GPU hardware may be faulty.
|
≥ 0 |
°C |
N/A |
instance_id,gpu |
1 minute |
|
gpu_usage_gpu |
(Agent) GPU Usage |
GPU compute usage. It is an instantaneous value at a sampling point.
|
0-100 |
% |
N/A |
instance_id,gpu |
1 minute |
|
gpu_usage_mem |
(Agent) GPU Memory Usage |
GPU memory usage. It is an instantaneous value at a sampling point.
|
0-100 |
% |
N/A |
|
1 minute |
|
gpu_used_mem |
(Agent) GPU Used Memory |
Memory used on the GPU
|
≥ 0 |
MB |
N/A |
|
1 minute |
|
gpu_free_mem |
(Agent) Remaining GPU Memory |
Idle GPU memory
|
≥ 0 |
MB |
N/A |
instance_id,gpu |
1 minute |
|
gpu_usage_encoder |
(Agent) Encoding Usage |
Encoder usage of the GPU. It is an instantaneous value at a sampling point.
|
0-100 |
% |
N/A |
|
1 minute |
|
gpu_usage_decoder |
(Agent) Decoding Usage |
Decoder usage of the GPU. It is an instantaneous value at a sampling point.
|
0-100 |
% |
N/A |
|
1 minute |
|
gpu_graphics_clocks |
(Agent) GPU Graphics Clocks |
GPU graphics (shader) clock frequency. The value is the GPU clock frequency related to graphics performance. If graphics capabilities are used, you can ignore this metric.
|
≥ 0 |
MHz |
N/A |
instance_id,gpu |
1 minute |
|
gpu_sm_clocks |
(Agent) GPU SM Clocks |
SM clocks on the GPU. The value is the clock frequency for controlling the GPU memory speed.
|
≥ 0 |
MHz |
N/A |
instance_id,gpu |
1 minute |
|
gpu_mem_clock |
(Agent) GPU Memory Clocks |
Memory clocks on the GPU. The value is the clock frequency closely related to CUDA core computing of the GPU.
|
≥ 0 |
MHz |
N/A |
instance_id,gpu |
1 minute |
|
gpu_video_clocks |
(Agent) GPU Video Clocks |
Video clocks on the GPU. The value is the codec clock frequency of the GPU.
|
≥ 0 |
MHz |
N/A |
instance_id,gpu |
1 minute |
|
gpu_tx_throughput_pci |
(Agent) GPU PCI Tx Throughput |
PCI Tx throughput on the GPU. The value is the amount of data sent by the GPU to the host via PCIe.
|
≥ 0 |
MByte/s |
N/A |
instance_id,gpu |
1 minute |
|
gpu_rx_throughput_pci |
(Agent) GPU PCI Rx Throughput |
PCI Rx throughput on the GPU. The value is the amount of data sent by the host to the GPU via PCIe.
|
≥ 0 |
MByte/s |
N/A |
instance_id,gpu |
1 minute |
|
gpu_volatile_correctable |
(Agent) Volatile Correctable ECC Errors |
Number of correctable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.
|
≥ 0 |
Count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_volatile_uncorrectable |
(Agent) Volatile Uncorrectable ECC Errors |
Number of uncorrectable ECC errors since the GPU is reset. The value is reset to 0 each time the GPU is reset.
|
≥ 0 |
Count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_aggregate_correctable |
(Agent) Aggregate Correctable ECC Errors |
Aggregate correctable ECC errors on the GPU
|
≥ 0 |
Count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_aggregate_uncorrectable |
(Agent) Aggregate Uncorrectable ECC Errors |
Aggregate uncorrectable ECC errors on the GPU
|
≥ 0 |
Count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_retired_page_single_bit |
(Agent) Retired Page Single Bit Errors |
Number of retired page single bit errors, which indicates the number of single-bit error pages blocked by the GPU
|
≥ 0 |
Count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_retired_page_double_bit |
(Agent) Retired Page Double Bit Errors |
Number of retired page double bit errors, which indicates the number of double-bit error pages blocked by the GPU
|
≥ 0 |
Count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_lnkcap_speed |
(Agent) Max. GPU Link Speed |
Maximum PCIe link speed of the GPU, which means the maximum data throughput of the GPU on the PCIe bus
|
≥ 0 |
GT/s |
N/A |
instance_id,gpu |
1 minute |
|
gpu_lnkcap_width |
(Agent) Max. GPU Link Width |
Maximum PCIe link width of the GPU, which means the maximum number of PCIe lanes supported by the GPU
|
≥ 0 |
count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_lnksta_speed |
(Agent) GPU Link Speed |
PCIe link speed of the GPU
|
≥ 0 |
GT/s |
N/A |
instance_id,gpu |
1 minute |
|
gpu_lnksta_width |
(Agent) GPU Link Width |
PCIe link width of the GPU, which means the number of PCIe lanes of the GPU
|
≥ 0 |
count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_nvlink_number |
(Agent) GPU NVLinks |
Number of NVLinks of the GPU. For example, A100 supports 12 NVLinks.
|
≥ 0 |
count |
N/A |
instance_id,gpu |
1 minute |
|
gpu_nvlink_bandwidth |
(Agent) Average GPU NVLink Bandwidth |
Average NVLink bandwidth of the GPU The value is the total bandwidth for GPU data transmission.
|
≥ 0 |
GB/s |
N/A |
instance_id,gpu |
1 minute |
OS Metrics: NPU
|
Metric ID |
Metric Name |
Description |
Value Range |
Unit |
Conversion Rule |
Dimension |
Monitoring Interval (Raw Data) |
|---|---|---|---|---|---|---|---|
|
npu_device_health |
(Agent) NPU Device Health |
NPU health status |
|
N/A |
N/A |
instance_id,npu |
1 minute |
|
npu_driver_health |
(Agent) NPU Driver Health |
Health status of the NPU driver |
|
N/A |
N/A |
instance_id,npu |
1 minute |
|
npu_power |
(Agent) NPU Power |
NPU power |
>0 |
W |
N/A |
instance_id,npu |
1 minute |
|
npu_temperature |
(Agent) NPU Temperature |
NPU temperature |
Natural numbers |
°C |
N/A |
instance_id,npu |
1 minute |
|
npu_voltage |
(Agent) NPU Voltage |
NPU voltage |
Natural numbers |
V |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_hbm |
(Agent) NPU HBM Usage |
NPU HBM usage |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_freq |
(Agent) NPU HBM Frequency |
NPU HBM frequency |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_freq_hbm |
(Agent) NPU HBM Frequency |
NPU HBM frequency |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_usage |
(Agent) Used HBM |
Used NPU HBM |
≥0 |
MB |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_temperature |
(Agent) HBM Temperature |
NPU HBM temperature |
Natural numbers |
°C |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_bandwidth_util |
(Agent) HBM Bandwidth Usage |
NPU HBM bandwidth usage |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_mem_capacity |
(Agent) HBM Memory Capacity |
NPU HBM memory capacity |
≥0 |
MB |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_ecc_enable |
(Agent) HBM ECC Check Status |
Whether HBM ECC check is enabled for the NPU |
|
N/A |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_single_bit_error_cnt |
(Agent) HBM Single-Bit Errors |
Number of HBM single-bit errors of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_double_bit_error_cnt |
(Agent) HBM Double-Bit Errors |
Number of HBM double-bit errors of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_total_single_bit_error_cnt |
(Agent) Single-Bit Errors in HBM Lifecycle |
Number of single-bit errors in an NPU HBM lifecycle |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_total_double_bit_error_cnt |
(Agent) Double-Bit Errors in HBM Lifecycle |
Number of double-bit errors in an NPU HBM lifecycle |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_single_bit_isolated_pages_cnt |
(Agent) Isolated Memory Pages with HBM Single-Bit Errors |
Number of isolated memory pages with single-bit HBM errors of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_hbm_double_bit_isolated_pages_cnt |
(Agent) Isolated Memory Pages with HBM Double-Bit Errors |
Number of isolated memory pages with double-bit HBM errors of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_usage_mem |
(Agent) Used NPU Memory |
Memory used on the NPU |
≥0 |
MB |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_mem |
(Agent) NPU Memory Usage |
NPU memory usage |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_hbm_bw |
(Agent) NPU HBM Bandwidth Usage |
NPU HBM bandwidth usage |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_freq_mem |
(Agent) NPU Memory Frequency |
NPU memory frequency |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_mem_bandwidth |
(Agent) NPU Memory Bandwidth Usage |
NPU memory bandwidth usage |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_vector_core |
(Agent) NPU Vector Core Usage |
Vector core usage of the NPU |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_sbe |
(Agent) NPU Single-Bit Errors |
Number of single-bit errors on the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_dbe |
(Agent) NPU Double-Bit Errors |
Number of dual-bit errors on the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_freq_ai_core |
(Agent) NPU AI Core Frequency |
AI core clock frequency of the NPU |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_freq_ai_core_rated |
(Agent) Rated NPU AI Core Frequency |
Rated AI core frequency of the NPU |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_ai_core |
(Agent) NPU AI Core Usage |
AI core usage of the NPU |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_aicpu_num |
(Agent) NPU AI CPUs |
Number of AI CPUs on the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_ai_cpu |
(Agent) NPU AI CPU Usage |
AI CPU usage of the NPU |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_aicpu_avg_util_rate |
(Agent) Average AI CPU Usage of NPU |
Average AI CPU usage of the NPU |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_aicpu_max_freq |
(Agent) Max. AI CPU Frequency of NPU |
Maximum AI CPU frequency of the NPU |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_aicpu_cur_freq |
(Agent) AI CPU Frequency of NPU |
AI CPU frequency of the NPU |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_util_rate_ctrl_cpu |
(Agent) NPU Control CPU Usage |
CPU usage controlled by the NPU |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_freq_ctrl_cpu |
(Agent) NPU Control CPU Frequency |
CPU frequency controlled by the NPU |
>0 |
MHz |
N/A |
instance_id,npu |
1 minute |
|
npu_link_cap_speed |
(Agent) Max. NPU Link Speed |
Maximum link speed of the NPU |
≥0 |
GT/s |
N/A |
instance_id,npu |
1 minute |
|
npu_link_cap_width |
(Agent) Max. NPU Link Width |
Maximum link width of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_link_status_speed |
(Agent) NPU Link Speed |
Link speed of the NPU |
≥0 |
GT/s |
N/A |
instance_id,npu |
1 minute |
|
npu_link_status_width |
(Agent) NPU Link Width |
Link width of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_device_network_health |
(Agent) NPU Network Health |
RoCE IP address connectivity of the NPU |
|
N/A |
N/A |
instance_id,npu |
1 minute |
|
npu_network_port_link_status |
(Agent) NPU Network Port Link Status |
Link status of the network port on the NPU |
|
N/A |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_tx_rate |
(Agent) NPU NIC Uplink Rate |
NIC uplink rate of the NPU |
≥0 |
MB/s |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_rx_rate |
(Agent) NPU NIC Downlink Rate |
NIC downlink rate of the NPU |
≥0 |
MB/s |
N/A |
instance_id,npu |
1 minute |
|
npu_mac_tx_mac_pause_num |
(Agent) Pause Frames Sent by MAC |
Total number of pause frames sent by the MAC address of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_mac_rx_mac_pause_num |
(Agent) Pause Frames Received by MAC |
Total number of pause frames received by the MAC address of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_mac_tx_pfc_pkt_num |
(Agent) PFC Frames Sent by MAC |
Total number of PFC frames sent by the MAC address of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_mac_rx_pfc_pkt_num |
(Agent) PFC Frames Received by MAC |
Total number of PFC frames received by the MAC address of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_mac_tx_bad_pkt_num |
(Agent) Bad Packets Sent by MAC |
Total number of bad packets sent by the MAC address of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_mac_rx_bad_pkt_num |
(Agent) Bad Packets Received by MAC |
Total number of bad packets received by the MAC address of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_tx_err_pkt_num |
(Agent) Bad Packets Sent by RoCE |
Total number of bad packets sent by the RoCE NIC of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_rx_err_pkt_num |
(Agent) Bad Packets Received by RoCE |
Total number of bad packets received by the RoCE NIC of the NPU |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_temperature |
(Agent) NPU Optical Module Case Temperature |
Case temperature of the NPU optical module |
Natural numbers |
°C |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_temperature_high_thres |
(Agent) Max. NPU Optical Module Case Temperature |
Upper limit for the case temperature of the NPU optical module |
Natural numbers |
°C |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_temperature_low_thres |
(Agent) Min. NPU Optical Module Case Temperature |
Lower limit for the case temperature of the NPU optical module |
Natural numbers |
°C |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_voltage |
(Agent) NPU Optical Module Voltage |
Voltage of the NPU optical module |
Natural numbers |
mV |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_voltage_high_thres |
(Agent) Max. NPU Optical Module Voltage |
Upper limit for the voltage of the NPU optical module |
Natural numbers |
mV |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_voltage_low_thres |
(Agent) Min. NPU Optical Module Voltage |
Lower limit for the voltage of the NPU optical module |
Natural numbers |
mV |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_power_lane0 |
(Agent) NPU Optical Module Lane 0 TX Power |
Transmit power of NPU optical module lane 0 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_power_lane1 |
(Agent) NPU Optical Module Lane 1 TX Power |
Transmit power of NPU optical module lane 1 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_power_lane2 |
(Agent) NPU Optical Module Lane 2 TX Power |
Transmit power of NPU optical module lane 2 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_power_lane3 |
(Agent) NPU Optical Module Lane 3 TX Power |
Transmit power of NPU optical module lane 3 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_rx_power_lane0 |
(Agent) NPU Optical Module Lane 0 RX Power |
Receive power of NPU optical module lane 0 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_rx_power_lane1 |
(Agent) NPU Optical Module Lane 1 RX Power |
Receive power of NPU optical module lane 1 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_rx_power_lane2 |
(Agent) NPU Optical Module Lane 2 RX Power |
Receive power of NPU optical module lane 2 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_rx_power_lane3 |
(Agent) NPU Optical Module Lane 3 RX Power |
Receive power of NPU optical module lane 3 |
≥0 |
mW |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_bias_lane0 |
(Agent) NPU Optical Module Lane 0 TX Bias Current |
Transmit bias current of NPU optical module lane 0 |
≥0 |
mA |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_bias_lane1 |
(Agent) NPU Optical Module Lane 1 TX Bias Current |
Transmit bias current of NPU optical module lane 1 |
≥0 |
mA |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_bias_lane2 |
(Agent) NPU Optical Module Lane 2 TX Bias Current |
Transmit bias current of NPU optical module lane 2 |
≥0 |
mA |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_bias_lane3 |
(Agent) NPU Optical Module Lane 3 TX Bias Current |
Transmit bias current of NPU optical module lane 3 |
≥0 |
mA |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_tx_los |
(Agent) NPU Optical Module TX LOS |
Statistics on Transmit LOS Flag of the NPU optical module |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_rx_los |
(Agent) NPU Optical Module RX LOS |
Statistics on Receive LOS Flag of the NPU optical module |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro1 0lane |
Maximum duration of NPU Macro1 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_0lane_total_sec |
(Agent) Total Duration of NPU Macro1 0lane |
Total duration of NPU Macro1 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro1 |
Number of CRC error packets received by NPU Macro1 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_crc_error_rate |
(Agent) NPU Macro1 BER |
Percentage of CRC error packets received by NPU Macro1 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro1 |
Number of packets retransmitted by NPU Macro1 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_rx_cnt |
(Agent) Packets Received by NPU Macro1 |
Number of packets received by NPU Macro1 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_serdes_lane0_snr |
(Agent) NPU Macro1 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_serdes_lane1_snr |
(Agent) NPU Macro1 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_serdes_lane2_snr |
(Agent) NPU Macro1 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_serdes_lane3_snr |
(Agent) NPU Macro1 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro1 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro1_tx_cnt |
(Agent) Packets Sent by NPU Macro1 |
Number of packets sent by NPU Macro1 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro2 0lane |
Maximum duration of NPU Macro2 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_0lane_total_sec |
(Agent) Total Duration of NPU Macro2 0lane |
Total duration of NPU Macro2 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro2 |
Number of CRC error packets received by NPU Macro2 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_crc_error_rate |
(Agent) NPU Macro2 BER |
Percentage of CRC error packets received by NPU Macro2 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro2 |
Number of packets retransmitted by NPU Macro2 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_rx_cnt |
(Agent) Packets Received by NPU Macro2 |
Number of packets received by NPU Macro2 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_serdes_lane0_snr |
(Agent) NPU Macro2 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_serdes_lane1_snr |
(Agent) NPU Macro2 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_serdes_lane2_snr |
(Agent) NPU Macro2 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_serdes_lane3_snr |
(Agent) NPU Macro2 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro2 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro2_tx_cnt |
(Agent) Packets Sent by NPU Macro2 |
Number of packets sent by NPU Macro2 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro3 0lane |
Maximum duration of NPU Macro3 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_0lane_total_sec |
(Agent) Total Duration of NPU Macro3 0lane |
Total duration of NPU Macro3 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro3 |
Number of CRC error packets received by NPU Macro3 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_crc_error_rate |
(Agent) NPU Macro3 BER |
Percentage of CRC error packets received by NPU Macro3 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro3 |
Number of packets retransmitted by NPU Macro3 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_rx_cnt |
(Agent) Packets Received by NPU Macro3 |
Number of packets received by NPU Macro3 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_serdes_lane0_snr |
(Agent) NPU Macro3 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_serdes_lane1_snr |
(Agent) NPU Macro3 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_serdes_lane2_snr |
(Agent) NPU Macro3 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_serdes_lane3_snr |
(Agent) NPU Macro3 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro3 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro3_tx_cnt |
(Agent) Packets Sent by NPU Macro3 |
Number of packets sent by NPU Macro3 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro4 0lane |
Maximum duration of NPU Macro4 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_0lane_total_sec |
(Agent) Total Duration of NPU Macro4 0lane |
Total duration of NPU Macro4 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro4 |
Number of CRC error packets received by NPU Macro4 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_crc_error_rate |
(Agent) NPU Macro4 BER |
Percentage of CRC error packets received by NPU Macro4 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro4 |
Number of packets retransmitted by NPU Macro4 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_rx_cnt |
(Agent) Packets Received by NPU Macro4 |
Number of packets received by NPU Macro4 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_serdes_lane0_snr |
(Agent) NPU Macro4 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_serdes_lane1_snr |
(Agent) NPU Macro4 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_serdes_lane2_snr |
(Agent) NPU Macro4 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_serdes_lane3_snr |
(Agent) NPU Macro4 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro4 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro4_tx_cnt |
(Agent) Packets Sent by NPU Macro4 |
Number of packets sent by NPU Macro4 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro5 0lane |
Maximum duration of NPU Macro5 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_0lane_total_sec |
(Agent) Total Duration of NPU Macro5 0lane |
Total duration of NPU Macro5 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro5 |
Number of CRC error packets received by NPU Macro5 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_crc_error_rate |
(Agent) NPU Macro5 BER |
Percentage of CRC error packets received by NPU Macro5 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro5 |
Number of packets retransmitted by NPU Macro5 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_rx_cnt |
(Agent) Packets Received by NPU Macro5 |
Number of packets received by NPU Macro5 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_serdes_lane0_snr |
(Agent) NPU Macro5 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_serdes_lane1_snr |
(Agent) NPU Macro5 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_serdes_lane2_snr |
(Agent) NPU Macro5 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_serdes_lane3_snr |
(Agent) NPU Macro5 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro5 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro5_tx_cnt |
(Agent) Packets Sent by NPU Macro5 |
Number of packets sent by NPU Macro5 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro6 0lane |
Maximum duration of NPU Macro6 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_0lane_total_sec |
(Agent) Total Duration of NPU Macro6 0lane |
Total duration of NPU Macro6 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro6 |
Number of CRC error packets received by NPU Macro6 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_crc_error_rate |
(Agent) NPU Macro6 BER |
Percentage of CRC error packets received by NPU Macro6 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro6 |
Number of packets retransmitted by NPU Macro6 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_rx_cnt |
(Agent) Packets Received by NPU Macro6 |
Number of packets received by NPU Macro6 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_serdes_lane0_snr |
(Agent) NPU Macro6 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_serdes_lane1_snr |
(Agent) NPU Macro6 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_serdes_lane2_snr |
(Agent) NPU Macro6 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_serdes_lane3_snr |
(Agent) NPU Macro6 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro6 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro6_tx_cnt |
(Agent) Packets Sent by NPU Macro6 |
Number of packets sent by NPU Macro6 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_0lane_max_consec_sec |
(Agent) Max. Duration of NPU Macro7 0lane |
Maximum duration of NPU Macro7 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_0lane_total_sec |
(Agent) Total Duration of NPU Macro7 0lane |
Total duration of NPU Macro7 0lane in a monitoring period |
≥0 |
s |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_crc_error_cnt |
(Agent) Error Packets Received by NPU Macro7 |
Number of CRC error packets received by NPU Macro7 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_crc_error_rate |
(Agent) NPU Macro7 BER |
Percentage of CRC error packets received by NPU Macro7 in a monitoring period |
0-100 |
% |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_retry_cnt |
(Agent) Packets Retransmitted by NPU Macro7 |
Number of packets retransmitted by NPU Macro7 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_rx_cnt |
(Agent) Packets Received by NPU Macro7 |
Number of packets received by NPU Macro7 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_serdes_lane0_snr |
(Agent) NPU Macro7 SerDes Lane0 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane0 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_serdes_lane1_snr |
(Agent) NPU Macro7 SerDes Lane1 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane1 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_serdes_lane2_snr |
(Agent) NPU Macro7 SerDes Lane2 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane2 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_serdes_lane3_snr |
(Agent) NPU Macro7 SerDes Lane3 SNR |
Signal-to-Noise Ratio (SNR) of NPU Macro7 SerDes Lane3 |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_macro7_tx_cnt |
(Agent) Packets Sent by NPU Macro7 |
Number of packets sent by NPU Macro7 in a monitoring period |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_media_snr_lane0 |
(Agent) NPU Optical Module Lane 0 Optical SNR |
Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 0 in the NPU optical module |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_media_snr_lane1 |
(Agent) NPU Optical Module Lane 1 Optical SNR |
Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 1 in the NPU optical module |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_media_snr_lane2 |
(Agent) NPU Optical Module Lane 2 Optical SNR |
Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 2 in the NPU optical module |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_opt_media_snr_lane3 |
(Agent) NPU Optical Module Lane 3 Optical SNR |
Signal-to-Noise Ratio (SNR) on the media (optical) side of lane 3 in the NPU optical module |
Natural numbers |
db |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_new_pkt_rty_num |
(Agent) Packets Retransmitted by NPU RoCE |
Number of packets retransmitted by NPU RoCE |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_out_of_order_num |
(Agent) PSN Error Packets Received by NPU RoCE |
Number of NPU RoCE packets with a PSN greater than the expected one or duplicating with an existing one If packets are out of order or lost, retransmission will be triggered. |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_rx_all_pkt_num |
(Agent) Packets Received by NPU RoCE |
Total number of packets received by NPU RoCE |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_rx_cnp_pkt_num |
(Agent) CNP Packets Received by NPU RoCE |
Total number of CNP packets received by NPU RoCE |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_tx_all_pkt_num |
(Agent) Packets Sent by NPU RoCE |
Total number of packets sent by NPU RoCE |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_tx_cnp_pkt_num |
(Agent) CNP Packets Sent by NPU RoCE |
Total number of CNP packets sent by NPU RoCE |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
|
npu_roce_tx_err_pkt_num |
(Agent) Bad Packets Sent by RoCE |
Total number of bad packets sent by the RoCE NIC of the NPU for reference |
≥0 |
count |
N/A |
instance_id,npu |
1 minute |
If an object is in a hierarchical system, specify the monitored dimension in hierarchical form when you use APIs to query metrics of this object.
For example, to query the available space (metric: disk_free) of a disk mount point on a BMS, the dimension of the metric is instance_id,mount_point, where instance_id indicates level 0 and mount_point indicates level 1.
- To query a single metric by calling an API, the mount_point dimension is used as follows:
dim.0=instance_id,3d65c1ac-9a9f-4c5f-a054-35184a087bb2&dim.1=mount_point,6666cd76f96956469e7be39d750cc7d9
3d65c1ac-9a9f-4c5f-a054-35184a087bb2 and 6666cd76f96956469e7be39d750cc7d9 are the values of instance_id and mount_point, respectively. For details about how to obtain the values, see Dimensions.
- To query multiple metrics by calling an API, the mount_point dimension is used as follows:
"dimensions": [ { "name": "instance_id", "value": "3d65c1ac-9a9f-4c5f-a054-35184a087bb2" }, { "name": "mount_point", "value": "6666cd76f96956469e7be39d750cc7d9" } ]3d65c1ac-9a9f-4c5f-a054-35184a087bb2 and 6666cd76f96956469e7be39d750cc7d9 are the values of instance_id and mount_point, respectively. For details about how to obtain the values, see Dimensions.
Dimensions
|
Dimension |
Key |
Value |
|---|---|---|
|
Cloud server |
instance_id |
Cloud server |
|
Server process |
proc |
Process |
|
Cloud server disk |
disk |
Disk |
|
Cloud server mount point |
mount_point |
Mount point |
|
Cloud server GPU |
gpu |
GPU |
|
Cloud server NPU |
npu |
NPU |
|
Cloud server NIC |
network_interface_card |
NIC |
|
Cloud server GPU |
gpu_slot |
GPU |
|
GPU process ID of cloud server |
pid_for_gpu |
GPU process ID |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot