Overview

Workloads can use nodes' GPU resources in either of the following modes:

Static GPU allocation (shared/allocated): GPU resources are allocated to pods in proportion, with both dedicated (one or more GPUs allocated to one pod) and shared (one GPU allocated to multiple pods) options available.
GPU virtualization: UCS on-premises clusters use xGPU virtualization to dynamically allocate the GPU memory and compute. A single GPU can be virtualized into up to 20 virtual GPUs. Dynamic allocation provides more flexibility than static allocation. You can assign the right amount of GPU for service stability, which improves the GPU utilization.
Highlights of GPU virtualization:
- Flexible: The GPU compute ratio and memory size are configured in a refined manner. The compute allocation granularity is 5% GPU, and the GPU memory allocation granularity is MiB.
- Isolated: There are two isolation modes: GPU memory isolation and isolation of GPU memory and compute.
- Compatible: There is no need to recompile the services or replace the CUDA library.