Help Center/
ModelArts/
Troubleshooting/
Training Jobs/
Training Performance Issues/
Training Performance Deteriorated
Updated on 2023-11-10 GMT+08:00
Training Performance Deteriorated
Symptom
When a ModelArts algorithm is used for training, it will take more time than expected for training.
Possible Causes
The possible causes are as follows:
- The job code or training parameters have been modified.
- The GPU hardware for training malfunctions.
Solution
- Check whether the training code and parameters have been modified.
- Check whether the allocation of the CPU, memory, GPU, snt9, or Infiniband resources complies with the expectation.
- Use CloudShell to log in to the Linux and check the GPU working status.
- Run the nvidia-smi command to check whether the GPU is working properly.
- Run the nvidia-smi -q -d TEMPERATURE command to check the temperature. If the temperature is too high, the training performance deteriorates.
Parent topic: Training Performance Issues
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot