Help Center/
Elastic Cloud Server/
Troubleshooting/
Self-diagnosis of Faulty GPU-accelerated ECSs/
Fault Diagnosis and Handling of Graphics Cards/
What Do I Do If the nvidia-smi Command Output Shows Overheated GPUs?
Updated on 2025-07-30 GMT+08:00
What Do I Do If the nvidia-smi Command Output Shows Overheated GPUs?
Possible Causes
The heat dissipation of the graphics card is abnormal, or the fan is damaged.
Impact
Excessively high temperature of the graphics card affects the services.
Solution
Run the nvidia-smi command to check whether the fan is normal.
- If the fan speed is 0, the fan may be damaged. In this case, stop and migrate the service. After the service is migrated, collect fault information by referring to Fault Information Collection and contact technical support to check whether the hardware is faulty.
- If ERR! is displayed in the command output, the graphics card may be overheated. Stop the service, wait until the graphics card cools down, and run the nvidia-smi command to check whether the ERR! disappears.
- If the command output is normal, adjust the service to limit the maximum power of the graphics card.
- If the fault persists, collect fault information by referring to Fault Information Collection and contact technical support.
Parent topic: Fault Diagnosis and Handling of Graphics Cards
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot