Why Is the Job Still Queued When Resources Are Sufficient?
- If a public resource pool is used, the resources may be used by other users. Please wait or find solutions in Why Is a Training Job Always Queuing?.
- If a dedicated resource pool is used, perform the following operations:
- Check whether other jobs (including inference jobs, training jobs, and development environment jobs) are running in the dedicated resource pool.
On the Dashboard page, you can go to the details page of the running jobs or instances to check whether the dedicated resource pool is used. You can stop them based on your needs to release resources.
Figure 1 Dashboard
- Click the dedicated resource pool to go to the details page and view the job list.
If other jobs are waiting in the queue, the new job must also join the queue.
Figure 2 Queuing jobs
- Check whether resources are fragmented.
For example, the cluster has two nodes, and there are four idle cards on each node. However, your job requires eight cards on one node. In this case, the idle resources cannot be allocated to your job.
- Check whether other jobs (including inference jobs, training jobs, and development environment jobs) are running in the dedicated resource pool.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot