Help Center/ Cloud Container Engine/ Best Practices/ Cloud Native AI/ Enabling Auto Scaling for a GPU Node
Updated on 2026-03-10 GMT+08:00

Enabling Auto Scaling for a GPU Node

If there are not enough GPU resources in a cluster, GPU nodes can be scaled out automatically. This section describes how to create an auto scaling policy for a GPU node.

Prerequisites

  • You have installed the CCE AI Suite (NVIDIA GPU) and CCE Cluster Autoscaler in the cluster.
  • The Auto Node Scale-Out function is enabled. To do so, go to Settings, click the Auto Scaling tab and enable Auto Node Scale-Out under Node Scale-Out Criteria.

Step 1: Configure a Node Pool

  1. Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Nodes.
  2. In the right pane, click the Node Pools tab and click Create Node Pool in the upper right corner. For details, see Creating a Node Pool.
  3. After the node pool is created, click Auto Scaling. In the AS Object area, enable Auto Scaling for the target specification and click OK.

Step 2: Create a GPU Workload and Enable Auto Scale-Out

  1. Use the following YAML to create a GPU workload:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ac-test
      namespace: default
    spec:
      replicas: 1 # Number of replicas
      selector:  
        matchLabels:
          app: ac-test
      template:
        metadata:
          labels:  
            app: ac-test
        spec:
          restartPolicy: Always 
          containers:
            - name: container-1
              image: pytorch/pytorch:2.1.1-cuda12.1-cudnn8-devel
              imagePullPolicy: IfNotPresent
              command: ["/bin/bash", "-c"]
              args:
                - "while true; do nvidia-smi; sleep 10; done"  
              resources:
                requests:
                  cpu: 250m
                  memory: 512Mi
                  nvidia.com/gpu: 1  
                limits:
                  cpu: 250m
                  memory: 512Mi
                  nvidia.com/gpu: 1 
          # Node affinity: Schedule the pod to the target GPU node pool (the one created previously).
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: cce.cloud.com/cce-nodepool
                        operator: In
                        values:
                          - gpu-130-nodepool-67633  # GPU node pool name

  2. Check the pods and nodes in the node pool. The node pool does not contain any node, and the pod is in the Pending state.

  3. Verify that a node pool scale-out has been triggered.

  4. Check the pods and nodes in the node pool. A new node is created in the node pool, and the pod is in the Running state.

Step 3: Delete the GPU Workload and Enable Auto Scale-In

If the number of GPU resources required by the GPU workload decrease and the node is idle, you can enable auto node scale-in to save resources.

  1. Go to Settings, click the Auto Scaling tab, enable Auto Node Scale-In under Auto Scale-In Settings, and configure the scale-in conditions as required. For details, see Auto Scaling.
  2. Verify that the idle node starts to be scaled in after the GPU workload has been deleted.

  3. Check whether the node has been removed.