GPU Metrics
The CCE AI Suite (NVIDIA GPU) add-on provides GPU monitoring metrics. This add-on offers additional GPU observability options. This section describes the metrics provided by CCE AI Suite (NVIDIA GPU).
GPU Metrics Provided by CCE
Category |
Metric |
Type |
Unit |
Monitoring Level |
Description |
---|---|---|---|---|---|
Utilization |
cce_gpu_utilization |
Gauge |
% |
GPU cards |
GPU compute usage |
cce_gpu_memory_utilization |
Gauge |
% |
GPU cards |
GPU memory usage |
|
cce_gpu_encoder_utilization |
Gauge |
% |
GPU cards |
GPU encoding usage |
|
cce_gpu_decoder_utilization |
Gauge |
% |
GPU cards |
GPU decoding usage |
|
cce_gpu_utilization_process |
Gauge |
% |
GPU processes |
GPU compute usage of each process |
|
cce_gpu_memory_utilization_process |
Gauge |
% |
GPU processes |
GPU memory usage of each process |
|
cce_gpu_encoder_utilization_process |
Gauge |
% |
GPU processes |
GPU encoding usage of each process |
|
cce_gpu_decoder_utilization_process |
Gauge |
% |
GPU processes |
GPU decoding usage of each process |
|
Memory |
cce_gpu_memory_used |
Gauge |
Byte |
GPU cards |
Used GPU memory |
cce_gpu_memory_total |
Gauge |
Byte |
GPU cards |
Total GPU memory |
|
cce_gpu_memory_free |
Gauge |
Byte |
GPU cards |
Idle GPU memory |
|
cce_gpu_bar1_memory_used |
Gauge |
Byte |
GPU cards |
Used GPU BAR1 memory |
|
cce_gpu_bar1_memory_total |
Gauge |
Byte |
GPU cards |
Total GPU BAR1 memory |
|
Frequency |
cce_gpu_clock |
Gauge |
MHz |
GPU cards |
GPU clock frequency |
cce_gpu_memory_clock |
Gauge |
MHz |
GPU cards |
The speed at which the GPU memory operates |
|
cce_gpu_graphics_clock |
Gauge |
MHz |
GPU cards |
GPU frequency |
|
cce_gpu_video_clock |
Gauge |
MHz |
GPU cards |
GPU video processor frequency |
|
Physical status |
cce_gpu_temperature |
Gauge |
°C |
GPU cards |
GPU temperature |
cce_gpu_power_usage |
Gauge |
Milliwatt |
GPU cards |
GPU power |
|
cce_gpu_total_energy_consumption |
Gauge |
Millijoule |
GPU cards |
Total GPU energy consumption |
|
Bandwidth |
cce_gpu_pcie_link_bandwidth |
Gauge |
bit |
GPU cards |
GPU PCIe bandwidth |
cce_gpu_nvlink_bandwidth |
Gauge |
Gbit/s |
GPU cards |
GPU NVLink bandwidth |
|
cce_gpu_pcie_throughput_rx |
Gauge |
KB/s |
GPU cards |
GPU PCIe RX bandwidth |
|
cce_gpu_pcie_throughput_tx |
Gauge |
KB/s |
GPU cards |
GPU PCIe TX bandwidth |
|
cce_gpu_nvlink_utilization_counter_rx |
Gauge |
KB/s |
GPU cards |
GPU NVLink RX bandwidth |
|
cce_gpu_nvlink_utilization_counter_tx |
Gauge |
KB/s |
GPU cards |
GPU NVLink TX bandwidth |
|
Memory isolation page |
cce_gpu_retired_pages_sbe |
Gauge |
N/A |
GPU cards |
Number of isolated GPU memory pages with single-bit errors |
cce_gpu_retired_pages_dbe |
Gauge |
N/A |
GPU cards |
Number of isolated GPU memory pages with dual-bit errors |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot