Why Is the Job Still Queued When Resources Are Sufficient?
- If a public resource pool is used, the resources may be used by other users. Please wait or find solutions in Why Is a Training Job Always Queuing?.
- If a dedicated resource pool is used, perform the following operations:
- Check whether other jobs (including inference jobs, training jobs, and development environment jobs) are running in the dedicated resource pool.
On the Dashboard page, you can go to the details page of the running jobs or instances to check whether the dedicated resource pool is used. You can stop them based on your needs to release resources.
- Go to the details page of the dedicated resource pool to check whether there are other queuing jobs.
- Check whether resources are fragmented.
For example, the cluster has two nodes, and there are four idle cards on each node. However, your job requires eight cards on one node. In this case, the idle resources cannot be allocated to your job.
- Check whether other jobs (including inference jobs, training jobs, and development environment jobs) are running in the dedicated resource pool.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.