Enabling Auto Scaling for a GPU Node

If there are not enough GPU resources in a cluster, GPU nodes can be scaled out automatically. This section describes how to create an auto scaling policy for a GPU node.

Prerequisites

You have installed the CCE AI Suite (NVIDIA GPU) and CCE Cluster Autoscaler in the cluster.
The Auto Node Scale-Out function is enabled. To do so, go to Settings, click the Auto Scaling tab and enable Auto Node Scale-Out under Node Scale-Out Criteria.

Step 1: Configure a Node Pool

Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Nodes.
In the right pane, click the Node Pools tab and click Create Node Pool in the upper right corner. For details, see Creating a Node Pool.
After the node pool is created, click Auto Scaling. In the AS Object area, enable Auto Scaling for the target specification and click OK.

Step 2: Create a GPU Workload and Enable Auto Scale-Out

Use the following YAML to create a GPU workload:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ac-test
  namespace: default
spec:
  replicas: 1 # Number of replicas
  selector:  
    matchLabels:
      app: ac-test
  template:
    metadata:
      labels:  
        app: ac-test
    spec:
      restartPolicy: Always 
      containers:
        - name: container-1
          image: pytorch/pytorch:2.1.1-cuda12.1-cudnn8-devel
          imagePullPolicy: IfNotPresent
          command: ["/bin/bash", "-c"]
          args:
            - "while true; do nvidia-smi; sleep 10; done"  
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
              nvidia.com/gpu: 1  
            limits:
              cpu: 250m
              memory: 512Mi
              nvidia.com/gpu: 1 
      # Node affinity: Schedule the pod to the target GPU node pool (the one created previously).
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: cce.cloud.com/cce-nodepool
                    operator: In
                    values:
                      - gpu-130-nodepool-67633  # GPU node pool name

Check the pods and nodes in the node pool. The node pool does not contain any node, and the pod is in the Pending state.
Verify that a node pool scale-out has been triggered.
Check the pods and nodes in the node pool. A new node is created in the node pool, and the pod is in the Running state.

Step 3: Delete the GPU Workload and Enable Auto Scale-In

If the number of GPU resources required by the GPU workload decrease and the node is idle, you can enable auto node scale-in to save resources.

Go to Settings, click the Auto Scaling tab, enable Auto Node Scale-In under Auto Scale-In Settings, and configure the scale-in conditions as required. For details, see Auto Scaling.
Verify that the idle node starts to be scaled in after the GPU workload has been deleted.
Check whether the node has been removed.

Parent Topic: Cloud Native AI

Previous topic: Using InferencePool and Envoy Gateway to Build an AI Infrastructure Layer

Next topic: Creating an HPA Policy Based on the GPU Usage of an Inference Service

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot