NPU Metrics
If the CCE AI Suite (Ascend NPU) add-on version is 2.1.55 or later, NPU-Exporter can be used to monitor and collect Ascend AI processor metrics. NPU-Exporter is designed to obtain and report runtime data of Ascend AI chips, including the number of Ascend AI processors and real-time network port receiving rate. These metrics are referred to as NPU metrics. By monitoring NPU metrics, you can gain real-time visibility into NPU performance, detect and resolve potential issues, and ensure that NPUs run stably and efficiently. This section provides a detailed overview of the NPU metrics reported by NPU-Exporter.
Billing
NPU metrics are custom metrics. Uploading such metrics to AOM incurs fees. For details, see Pricing Details.
Applicable NPU Nodes
Only NPU metrics of the nodes listed in AI-accelerated ECSs can be monitored and collected.
NPU Metrics
NPU-Exporter can collect 73 NPU metrics. This section focuses on the common metrics supported by nodes listed in AI-accelerated ECSs. For details, see Table 1.
Category |
Metric |
Description |
Metric Label |
Field Type |
---|---|---|---|---|
NPU |
npu_chip_info_name |
Name and ID of an Ascend AI processor |
container_name: a container name |
String |
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
npu_chip_info_health_status |
Health status of an Ascend AI processor. Options:
|
container_name: a container name |
String |
|
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
npu_chip_info_power |
Power consumption of an Ascend AI processor, in watts (W)
NOTE:
If the NPU on the node is Snt3P, this metric specifies the board power consumption. If the NPU is Snt3, this metric specifies the power consumption of the Ascend AI processor. |
container_name: a container name |
String |
|
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
npu_chip_info_temperature |
Temperature of an Ascend AI processor, in degrees Celsius (°C) |
container_name: a container name |
String |
|
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
npu_chip_info_utilization |
AI Core usage of an Ascend AI processor, in percentage |
container_name: a container name |
String |
|
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
npu_chip_info_vector_utilization |
AI Vector usage of an Ascend AI processor |
container_name: a container name |
String |
|
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
DDR |
npu_chip_info_used_memory |
Used DDR memory of an Ascend AI processor, in MB |
container_name: a container name |
String |
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
|||
npu_chip_info_total_memory |
Total DDR memory of an Ascend AI processor, in MB |
container_name: a container name |
String |
|
id: an NPU ID |
String |
|||
model_name: name of an Ascend AI processor |
String |
|||
namespace: a namespace name |
String |
|||
pcie_bus_info: PCIe information of an Ascend AI processor |
String |
|||
pod_name: a pod name |
String |
|||
vdie_id: Unique ID of an Ascend AI processor, which can be used as the UUID of the NPU |
String |
Helpful Links
You can use NPU-Exporter to monitor these NPU metrics. For details, see Comprehensive Monitoring of NPU Metrics.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot