Supporting Kubernetes' Default GPU Scheduling
After GPU virtualization is enabled, when your workloads are scheduled to GPUs, you are advised to configure volcano.sh/gpu-mem.128Mi for GPU memory isolation, or configure both volcano.sh/gpu-mem.128Mi and volcano.sh/gpu-core.percentage for compute and GPU memory isolation. Additionally, Kubernetes' default GPU scheduling is supported for workloads that use nvidia.com/gpu resources.
- If nvidia.com/gpu is set to a decimal value, CCE will use GPU virtualization to isolate the GPU memory and allocate GPU memory based on the specified ratio. The GPU memory allocated to containers must be an integer multiple of 128 MiB. Otherwise, the value will be rounded down. For example, on a 16-GiB GPU, if nvidia.com/gpu is set to 0.5, the containers will be allocated 8 GiB of GPU memory (0.5 x 16 GiB), or 8192 MiB (64 times 128 MiB).
- If nvidia.com/gpu is set to an integer, the entire GPU card's resources will be allocated to the containers. Before GPU virtualization is enabled, workloads that have used nvidia.com/gpu resources will not be automatically converted to use virtual GPUs. Instead, they will continue to use the entire GPU card's resources.

After GPU virtualization is enabled, declaring nvidia.com/gpu in workloads is equivalent to enabling virtual GPU memory isolation. The GPU quota can be shared with workloads in GPU memory isolation mode, but not with those in compute and GPU memory isolation mode.
Notes and Constraints
To support Kubernetes' default GPU scheduling on GPU nodes, the CCE AI Suite (NVIDIA GPU) add-on must be of v2.0.10 or later, and the Volcano Scheduler add-on must be of v1.10.5 or later.
Configuration Example
- Use kubectl to access the cluster.
- Create a workload that uses nvidia.com/gpu resources.
Create a gpu-app.yaml file. The following shows an example:
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-app namespace: default spec: replicas: 1 selector: matchLabels: app: gpu-app template: metadata: labels: app: gpu-app spec: schedulerName: volcano containers: image: <your_image_address> # Replace it with your image address. name: container-0 resources: requests: cpu: 250m memory: 512Mi nvidia.com/gpu: 0.1 # Number of requested GPUs limits: cpu: 250m memory: 512Mi nvidia.com/gpu: 0.1 # Maximum number of GPUs that can be used imagePullSecrets: - name: default-secret
- Run the following command to create an application:
kubectl apply -f gpu-app.yaml
- Log in to the pod and check the total GPU memory allocated to the pod.
kubectl exec -it gpu-app -- nvidia-smi
Expected output:
Thu Jul 27 07:53:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A30 Off | 00000000:00:0D.0 Off | 0 | | N/A 47C P0 34W / 165W | 0MiB / 2304MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
The output shows that the total GPU memory that can be used by the pod is 2304 MiB.
In this example, the total GPU memory on the GPU node is 24258 MiB, but the number 2425.8 (24258 × 0.1) is not an integer multiple of 128 MiB. Therefore, the value 2425.8 is rounded down to 18 times of 128 MiB (18 × 128 MiB = 2304 MiB).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot