Help Center/
Elastic Cloud Server/
Troubleshooting/
Self-diagnosis of Faulty GPU-accelerated ECSs/
Fault Diagnosis and Handling of Graphics Cards/
What Do I Do If nvidia-smi Command Output Shows the SRAM ECC Error (V100 GPUs)?
Updated on 2025-07-30 GMT+08:00
What Do I Do If nvidia-smi Command Output Shows the SRAM ECC Error (V100 GPUs)?
Possible Causes
Errors may occur in the GPU memory.
Impact
GPU-related applications may be affected.
Solution
Run the nvidia-smi command to view the graphics card information.
- In the command output, if there are ECC errors in the Volatile Uncorr. ECC, run the nvidia-smi -q -i &.{gpu_id} command to view the graphics card details.
- In the command output, if there are no ECC errors in the Volatile Uncorr. ECC, run the nvidia-smi -q command to view all the graphics cards details.
- As shown in the figure, if only the value of Device Memory increases in the Single Bit under Volatile or the Single Bit under Aggregate, no operation is required.
- In Single Bit and Double bit under Volatile or Single Bit and Double bit under Aggregate, if the sum of Register File, L1 Cache, L2 Cache, Texture Memory, Texture Shared, and CBU is greater than 0, contact fault information by referring to Fault Information Collection and contact technical support.
Parent topic: Fault Diagnosis and Handling of Graphics Cards
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot