Help Center/
Ubiquitous Cloud Native Service/
User Guide/
UCS Clusters/
On-Premises Clusters/
Managing an On-Premises Cluster/
GPU Scheduling/
Overview
Updated on 2024-12-18 GMT+08:00
Overview
Workloads can use nodes' GPU resources in either of the following modes:
- Static GPU allocation (shared/allocated): GPU resources are allocated to pods in proportion, with both dedicated (one or more GPUs allocated to one pod) and shared (one GPU allocated to multiple pods) options available.
- GPU virtualization: UCS on-premises clusters use xGPU virtualization to dynamically allocate the GPU memory and compute. A single GPU can be virtualized into up to 20 virtual GPUs. Dynamic allocation provides more flexibility than static allocation. You can assign the right amount of GPU for service stability, which improves the GPU utilization.
Highlights of GPU virtualization:
- Flexible: The GPU compute ratio and memory size are configured in a refined manner. The compute allocation granularity is 5% GPU, and the GPU memory allocation granularity is MiB.
- Isolated: There are two isolation modes: GPU memory isolation and isolation of GPU memory and compute.
- Compatible: There is no need to recompile the services or replace the CUDA library.
Parent topic: GPU Scheduling
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot