Configuring Auto Scaling for xGPU Nodes
If there are not enough GPU virtualization resources in a cluster, xGPU nodes can be scaled out automatically. This section describes how to create an auto scaling policy for xGPU nodes.
Prerequisites
- A cluster of v1.28 or v1.29 is available.
- CCE AI Suite (NVIDIA GPU) (v2.7.5 or later), Volcano Scheduler, and CCE Cluster Autoscaler (v1.28.78/v1.29.41 or later) have been installed in the cluster.
Step 1: Configure the Node Pool
- Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Nodes.
- Click Create Node Pool to create an xGPU node pool. For details, see Creating a Node Pool.
For details about requirements on xGPU nodes, such as the specifications, OS, and runtime, see Preparing xGPU Resources.
- After the node pool is created, click Auto Scaling. In the AS Object area, enable Auto Scaling for the target specification and click OK.
Step 2: Configure Heterogeneous Resources
- In the navigation pane, choose Settings. Then, click the Heterogeneous Resources tab.
- In the GPU Settings area, locate Node Pool Configurations and select the created node pool.
- Select a driver that meets GPU virtualization requirements and enable GPU virtualization based on Preparing xGPU Resources.
Figure 1 Heterogeneous Resources
- Click Confirm configuration.
Step 3: Create a GPU Virtualization Workload and Trigger Capacity Expansion
Create a Deployment that uses GPU virtualization resources and requests a number of GPUs exceeding the current upper limit available in the cluster. For details, see Using GPU Virtualization. For example, there is a total of 16 GiB of GPU memory available, with each pod requiring 1 GiB. Then, configure 17 pods, which need a total of 17 GiB of GPU memory.
After a short period of time, you can find GPU node scale-out on the node pool details page.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot