Preparing xGPU Resources

CCE uses xGPU virtualization technologies to dynamically divide the GPU memory and computing power. A single GPU can be virtualized into up to 20 virtual GPU devices. This section describes how to implement GPU scheduling and isolation capabilities on GPU nodes.

Prerequisites

Item	Supported Version
Cluster version	v1.23.8-r0, v1.25.3-r0, or later
OS	Huawei Cloud EulerOS 2.0
GPU type	T4 and V100
Driver version	470.57.02, 510.47.03, and 535.54.03
Runtime	containerd
Add-on	The following add-ons must be installed in the cluster: Volcano Scheduler: 1.10.5 or later CCE AI Suite (NVIDIA GPU): 2.0.5 or later

Step 1: Install the Add-on

Both CCE AI Suite (NVIDIA GPU) and Volcano Scheduler must be installed in the cluster.

Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons.
Locate CCE AI Suite (NVIDIA GPU) on the right and click Install.
On the displayed page, configure the add-on.
- Add-on Specifications: Select Default or Custom as required.
- Containers: Configurable only when Add-on Specifications is set to Custom.
- NVIDIA Driver: Enter the address of the NVIDIA driver. All GPU nodes in the cluster use the same driver.
  - If the download link is a public network address, for example, https://us.download.nvidia.com/tesla/470.57.02/NVIDIA-Linux-x86_64-470.57.02.run, bind an EIP to each GPU node. For details about how to obtain the driver link, see Obtaining the Driver Link from Public Network.
  - If the download link is an OBS URL, you do not need to bind an EIP to GPU nodes.
  - Ensure that the NVIDIA driver version matches the GPU node.
  - After the driver version is changed, restart the node for the change to take effect.
- Driver Selection: If you do not want all GPU nodes in a cluster to use the same driver, CCE allows you to install a different GPU driver for each node pool.
  - The add-on installs the driver with the version specified by the node pool. The driver takes effect only for new pool nodes.
  - After the driver version is updated, it takes effect on the nodes newly added to the node pool. Existing nodes must restart to apply the changes.
- GPU virtualization (supported in 2.0.5 and later versions): Enable GPU virtualization to support the segmentation and isolation for the compute power and GPU memory of a single GPU.
  If the Volcano add-on has not been installed in the cluster, GPU virtualization cannot be enabled. Click One-click installation to install it. To configure the Volcano add-on parameters during installation, click Custom Installation. For details, see Volcano Scheduler.
  
  If the Volcano add-on has been installed in the cluster but its version does not support GPU virtualization, click Upgrade to upgrade it. To configure the Volcano add-on parameters during installation, click Custom Upgrade. For details, see Volcano Scheduler.
  After GPU virtualization is enabled, select Virtualization nodes are compatible with GPU sharing mode, that is, default GPU scheduling in Kubernetes is supported. This capability requires that the version of gpu-device-plugin is 2.0.10 or later and the version of Volcano is 1.10.5 or later.
  - If you enable compatibility, the nvidia.com/gpu quota specified in workloads (the nvidia.com/gpu quota is set to a decimal fraction, for example, 0.5) is provided by GPU virtualization to implement GPU memory isolation. The GPU memory is allocated to containers based on the specified quota. For example, 8 GiB (0.5 x 16 GiB) GPU memory is allocated. The value of GPU memory must be an integer multiple of 128 MiB. Otherwise, the value is automatically rounded down to the nearest integer. If nvidia.com/gpu resources have been used in the workload before compatibility is enabled, the resources will not be provided by GPU virtualization but the entire GPU.
  - After compatibility is enabled, if you use the nvidia.com/gpu quota, it is equivalent to enabling GPU memory isolation. The nvidia.com/gpu quota can share a GPU with workloads in GPU memory isolation mode, but cannot share a GPU with workloads in compute and GPU memory isolation mode.
  - If compatibility is disabled, the nvidia.com/gpu quota specified in the workload only affects the scheduling result. It does not require GPU memory isolation. That is, although the nvidia.com/gpu quota is set to 0.5, you can still view complete GPU memory in the container. In addition, workloads using nvidia.com/gpu resources and workloads using virtualized GPU memory cannot be scheduled to the same node.
  - If you deselect Virtualization nodes are compatible with GPU sharing mode, running workloads will not be affected, but workloads may fail to be scheduled. For example, if compatibility is disabled, the workload using nvidia.com/gpu resources are still in the GPU memory isolation mode. As a result, the GPU cannot schedule workloads in compute and GPU memory isolation mode. You need to delete workloads using nvidia.com/gpu resources before rescheduling.
Click Install.

Step 2: Create a GPU Node

Create nodes that support GPU virtualization in the cluster to use the GPU virtualization function. For details, see Creating a Node or Creating a Node Pool.

If your cluster already has GPU nodes that meet the Prerequisites, skip this step.

Step 3 (Optional): Modifying the Volcano Scheduling Policy

The default scheduling policy of Volcano for GPU nodes is Spread. If the node configurations are the same, Volcano selects the node with the minimum number of running containers, so that containers can be evenly allocated to each node. In contrast, the bin packing policy attempts to schedule all containers to one node to avoid resource fragmentation.

If the bin packing policy is required when the GPU virtualization feature is used, you can modify the policy in the advanced settings of the Volcano add-on. The procedure is as follows:

Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons.
Find the Volcano add-on on the right and click Edit.

On the displayed page, modify the advanced settings.

In the nodeorder add-on, add the arguments parameter and set leastrequested.weight to 0. That is, set the priority of the node with the fewest allocated resources to 0.
Add the bin packing add-on, and specify the weights of xGPU customized resources (volcano.sh/gpu-core.percentage and volcano.sh/gpu-mem.128Mi).

A complete example is as follows:

{
    "colocation_enable": "",
    "default_scheduler_conf": {
        "actions": "allocate, backfill, preempt",
        "tiers": [
            {
                "plugins": [
                    {
                        "name": "priority"
                    },
                    {
                        "enablePreemptable": false,
                        "name": "gang"
                    },
                    {
                        "name": "conformance"
                    }
                ]
            },
            {
                "plugins": [
                    {
                        "enablePreemptable": false,
                        "name": "drf"
                    },
                    {
                        "name": "predicates"
                    },
                    {
                        "name": "nodeorder",
                        // Set the priority of the node with the fewest allocated resources to 0.
                        "arguments": {
                            "leastrequested.weight": 0
                        }
                    }
                ]
            },
            {
                "plugins": [
                    {
                        "name": "cce-gpu-topology-predicate"
                    },
                    {
                        "name": "cce-gpu-topology-priority"
                    },
                    {
                        "name": "xgpu"
                    },
                    // Add the bin packing add-on, and specify the weights of xGPU resources.
                    {
                        "name": "binpack",
                        "arguments": {
                            "binpack.resources": "volcano.sh/gpu-core.percentage,volcano.sh/gpu-mem.128Mi",
                            "binpack.resources.volcano.sh/gpu-mem.128Mi": 10,
                            "binpack.resources.volcano.sh/gpu-core.percentage": 10
                        }
                    }
                ]
            },
            {
                "plugins": [
                    {
                        "name": "nodelocalvolume"
                    },
                    {
                        "name": "nodeemptydirvolume"
                    },
                    {
                        "name": "nodeCSIscheduling"
                    },
                    {
                        "name": "networkresource"
                    }
                ]
            }
        ]
    },
    "tolerations": [
        {
            "effect": "NoExecute",
            "key": "node.kubernetes.io/not-ready",
            "operator": "Exists",
            "tolerationSeconds": 60
        },
        {
            "effect": "NoExecute",
            "key": "node.kubernetes.io/unreachable",
            "operator": "Exists",
            "tolerationSeconds": 60
        }
    ]
}