Updated on 2024-12-18 GMT+08:00

Preparing GPU Resources

This section describes how you can plan and prepare basic software and hardware before using GPU capabilities.

Basic Planning

Resource

Version

Cluster

v1.25.15-r7 or later

OS

Huawei Cloud EulerOS 2.0

System architecture

x86

GPU

T4 and V100

Driver version

Only GPU driver 470.57.02, 510.47.03, or 535.54.03 for GPU virtualization

Container runtime

containerd

Add-ons

The following add-ons must be installed in a cluster:

Step 1: Add GPU Nodes to a Cluster and Label the Nodes

If there are GPU nodes that comply with the basic planning in your cluster, skip this procedure.

  1. Add GPU nodes to your cluster. For details, see Adding Nodes to On-Premises Clusters.
  2. Label the nodes with accelerator: nvidia-{GPU model}. For details, see Adding Labels/Taints to Nodes.

    Figure 1 Labeling nodes that support GPU virtualization

Step 2: Install the Add-ons

If the add-ons that comply with the basic planning have been installed in your cluster, you can skip this procedure.

If the driver version is changed, restart the node to apply the change.

Before restarting a node, evict all pods on that node. Make sure to reserve GPU resources to avoid pod scheduling failures during node drainage. Insufficient resources can affect services.

  1. Log in to the UCS console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. In the Add-ons Installed area, check whether the Volcano and gpu-device-plugin add-ons have been installed.
  2. If the gpu-device-plugin add-on is not installed, install it by referring to gpu-device-plugin.

    To enable GPU virtualization, install the Volcano add-on. For details, see Volcano.