Help Center/ Cloud Container Engine/ Best Practices/ Auto Scaling/ Using Karpenter for Auto Scaling of Nodes
Updated on 2026-06-17 GMT+08:00

Using Karpenter for Auto Scaling of Nodes

Karpenter is a dynamic, high-performance, open-source cluster auto scaling solution for Kubernetes. It aims to use the right number of nodes at the right time, simplifying Kubernetes infrastructure management. Compared with Cluster Autoscaler, Karpenter reduces the resource scaling time from minutes to seconds, significantly improving workload efficiency in clusters and lowering costs.

Karpenter:

  • Monitors pods marked as unschedulable. Pods may become unschedulable due to reasons, such as insufficient CPU or memory resources, selector conditions not met, mismatched node taints and tolerations, or occupied host ports.
  • Evaluates the scheduling requirements of the unschedulable pods.
  • Provisions new nodes that meet the requirements of those pods.
  • Deletes nodes when they are no longer needed, for example, when nodes are idle or resources expire.

Core Features and Advantages

  • Event-driven, rapid scale-out: Traditional Cluster Autoscaler (CA) relies on periodic polling and cloud auto-scaling groups, which typically requires three to five minutes to add new nodes. Karpenter abandons this model and instead listens continuously for pod scheduling events within the cluster, achieving millisecond-level response times. It evaluates the resource requirements of pending pods directly and calls cloud provider APIs without intermediate layers. This enables rapid provisioning of the most suitable compute instances to handle traffic spikes and high-concurrency workloads.
  • Node pool-free architecture: Karpenter bypasses the constraints of traditional fixed node pools. Through declarative NodePool policies, it enables flexible configuration of AZs, instance architectures, and billing models. The system accurately parses pod requirements for CPU, memory, and scheduling constraints such as affinity and tolerations. Combined with FlexusX instances, Karpenter can configure CPU-to-memory ratios on demand to ensure that node specifications precisely match actual service requirements. This prevents resource overcommitment and waste at the source.
  • Continuous intelligent consolidation: Karpenter emphasizes efficiency management throughout the entire cluster lifecycle, not merely scale-out speed. The system continuously evaluates real-time resource utilization. By enabling consolidation policies such as WhenEmptyOrUnderutilized, Karpenter automatically reclaims completely idle nodes and proactively evicts and migrates pods from underutilized nodes in a controlled manner. By aggregating scattered workloads onto fewer or more cost-effective instances, Karpenter maximizes the reduction of idle costs.
  • Capacity and cost awareness:
    • Karpenter integrates with the cloud provider's billing model. During scale-out, it performs multi-dimensional cost calculations based on real-time instance pricing and intelligently selects the instance combination with the lowest unit cost that meets pod requirements. It also natively supports flexible hybrid deployment of on-demand and spot instances, minimizing overall compute costs.
    • Karpenter also maintains real-time awareness of cloud provider capacity. If the preferred instance type is sold out, triggering a resource insufficiency error, Karpenter automatically retries and seamlessly falls back to other available instance types or AZs within milliseconds. This eliminates the scaling bottlenecks caused by single node pool shortages in traditional CA implementations and ensures service continuity.

Prerequisites

  • A cluster that meets Karpenter's prerequisites is available, with worker nodes provisioned for Karpenter to run on.
  • There are EIPs bound to nodes for pulling images from the Internet during chart installation.
  • A Karpenter container requires Internet access. For a standard cluster, the node where Karpenter is located needs an EIP bound. For a Turbo cluster, the Karpenter container needs an EIP bound.

Notes and Constraints

  • Karpenter requires the corresponding cloud service provider permissions to create or delete nodes. Currently, Karpenter uses Access Key and Secret Key (AK and SK) credentials. In the future, Karpenter will be available as a system add-on in the CCE add-on marketplace and support custom add-on agencies or pod identity authentication.
  • Both Karpenter and CCE Cluster Autoscaler are node-scaling add-ons. They cannot be installed together, as doing so may cause mutual interference.
  • Karpenter cannot scale in the nodes where CCE system add-on pods are running.

Deploying Karpenter

  1. Obtain a chart.

    Go to the chart page, select a proper version, and download the Helm chart in .tgz format. This section uses the chart of version 0.2.1 as an example. This chart applies to CCE clusters of v1.29 or later. The configuration items in the chart may vary according to the version. The configuration in this section takes effect only for the chart of version 0.2.1.

  2. Upload the chart.

    1. Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose App Templates and click Upload Chart in the upper right corner.
    2. Click Add, select the chart to be uploaded, and click Upload.

  3. Specify the value.yaml file.

    You can create a value.yaml configuration file on the local PC to configure workload installation parameters. During the installation, you only need to import this configuration file for custom installation. Other unspecified parameters will use the default settings.

    The settings are as follows:
    # Default values for karpenter-provider-huawei
    # -- Number of controller replicas
    replicaCount: 1
    # Controller image configuration
    image:
      # -- Controller image repository
      repository: swr.ap-southeast-3.myhuaweicloud.com/huaweiclouddeveloper/cce/karpenter/controller
      # -- Controller image tag
      tag: "0.2.1"
      # -- Image pull policy
      pullPolicy: IfNotPresent
    # kube-rbac-proxy sidecar configuration
    rbacProxy:
      image:
        # -- kube-rbac-proxy image repository
        repository: quay.io/brancz/kube-rbac-proxy
        # -- kube-rbac-proxy image tag
        tag: "v0.16.0"
        # -- Image pull policy
        pullPolicy: IfNotPresent
    # -- Image pull secrets for controller pods
    imagePullSecrets: []
    # - name: registry-credentials
    # -- Name prefix for all resources
    namePrefix: "karpenter-provider-huawei-"
    serviceAccount:
      # -- Create ServiceAccount
      create: true
      # -- ServiceAccount name
      name: controller-manager
    # Controller arguments
    controller:
      # -- Metrics port
      metricsPort: 8080
      # -- Health probe port
      healthProbePort: 8081
    # Huawei Cloud credentials
    # The generated Secret uses HUAWEICLOUD_SDK_AK, HUAWEICLOUD_SDK_SK,
    # and HUAWEICLOUD_SDK_REGION_ID from this block, plus
    # HUAWEICLOUD_SDK_CCE_CLUSTER_ID from clusterInfo.clusterID.
    # If credentials.create=false, the existing Secret should provide the same keys.
    credentials:
      # -- Create a Secret for credentials
      create: true
      # -- Secret name for Huawei Cloud credentials
      name: "huawei-credentials"
      # -- Use an existing Secret when credentials.create is false
      existingSecret: ""
      # -- Huawei Cloud access key
      accessKey: "your-access-key"
      # -- Huawei Cloud secret key
      secretKey: "your-secret-key"
      # -- Huawei Cloud region ID
      region: "your-region-id"
    clusterInfo:
      # -- Huawei Cloud CCE cluster ID
      clusterID: "your-cluster-id"
      # -- Cluster category: Optional. Enter "eni" for Turbo network types.
      # For other network types (vpc-router or overlay_l2), enter other values.
      category: ""
    # -- yangtseEipInfo: Supports user-defined EIPs (Elastic IPs) bound to Karpenter pods.
    # This only takes effect when clusterInfo.category is set to "eni".
    yangtseEipInfo:
      yangtse.io/pod-with-eip: "true"
    # Controller resources
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 200m
        memory: 256Mi
    # -- Pod security context
    podSecurityContext:
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault
    # -- Container security context for manager
    securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - "ALL"
    # -- Node selector for controller pods
    nodeSelector: {}
    # -- Tolerations for controller pods
    tolerations: []
    # -- Affinity for controller pods
    affinity: {}
    • If clusterInfo.category is set to eni, an EIP is automatically bound to a Karpenter pod. You are advised to bind an EIP to a pod in a CCE Turbo cluster. You can configure more EIP parameters in yangtseEipInfo. For details about parameter settings, see Configuring an EIP for a Pod in a CCE Turbo Cluster.
    • Parameters such as clusterInfo.clusterID, credentials.accessKey, credentials.secretKey, and credentials.region are mandatory. Change other parameter values based on service requirements.

  4. Create a release.

    1. Log in to the CCE console and click the target cluster name. In the left navigation pane, choose App Templates.
    2. Locate the uploaded chart and click Install.
    3. Configure Release Name, Namespace, and Select Version.
    4. Click Add next to Configuration File, select the YAML file created locally, and click Install.

    5. On the Releases tab, view the status of the release.

Performing Verification

  1. Create NodePool and CCENodeClass resources. For details about the resource parameters, see Configuration Parameters.

    apiVersion: karpenter.k8s.huawei/v1alpha1
    kind: CCENodeClass
    metadata:
      name: demo-cce-elastic-cpu
    spec:
      # Add one subnet per target AZ if you want this demo NodePool to launch
      # across multiple zones.
      subnetSelectorTerms:
        - id: "30abcb7b-ddeb-4a93-9e83-0b21f59b07a3"
      imsSelector:
        imsFamily: "Huawei Cloud EulerOS 2.0"
      blockDeviceMappings:
        root:
          volumeSize: 40
          volumeType: SSD
        k8s:
          volumeSize: 100
          volumeType: SSD
      runtimeConfiguration:
        type: containerd
      login:
        userPassword:
          username: root
          password: "JDYk*****"
    ---
    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
      name: demo-cce-elastic-cpu
    spec:
      disruption:
        budgets:
          - nodes: "1"
        consolidationPolicy: WhenEmptyOrUnderutilized
        consolidateAfter: 30s
      limits:
        # Allow enough headroom for the 600-replica GSS validation to scale past 4 x 12c nodes.
        cpu: "100"
        memory: 384Gi
      template:
        metadata:
          labels:
            demo.huawei.com/scenario: cpu-elastic
            demo.huawei.com/nodepool-profile: shared-burst-and-consolidation
        spec:
          nodeClassRef:
            group: karpenter.k8s.huawei
            kind: CCENodeClass
            name: demo-cce-elastic-cpu
          requirements:
            - key: kubernetes.io/arch
              operator: In
              values:
                - amd64
            - key: kubernetes.io/os
              operator: In
              values:
                - linux
            - key: karpenter.sh/capacity-type
              operator: In
              values:
                - on-demand
            # Keep zone unconstrained so Karpenter can use every AZ exposed by the
            # selected ECSNodeClass subnets. Re-add a topology.kubernetes.io/zone
            # requirement if you need to pin this demo to specific AZs.
            - key: node.kubernetes.io/instance-type
              operator: In
              values:
                - c9.large.4
                - c9.xlarge.4
                - c9.2xlarge.4
                - c9.4xlarge.4

  2. Create a Deployment to verify that Karpenter can provision nodes.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cpu-burst
      namespace: default
    spec:
      replicas: 0
      selector:
        matchLabels:
          app: cpu-burst
      template:
        metadata:
          labels:
            app: cpu-burst
        spec:
          terminationGracePeriodSeconds: 10
          nodeSelector:
            demo.huawei.com/scenario: cpu-elastic
          containers:
            - name: web
              image: nginx:1.27-alpine
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 80
                  name: http
              resources:
                requests:
                  cpu: "1400m"
                  memory: "1200Mi"
                limits:
                  cpu: "1400m"
                  memory: "1200Mi"

  3. Scale out the Deployment.

    kubectl scale deployment cpu-burst --replicas=10

    After the Deployment is deployed, you can see that the newly started pods cannot be scheduled to the existing nodes. After the new nodes are created, the pods can be properly scheduled to the new nodes.

    $ kubectl get node
    NAME            STATUS   ROLES    AGE     VERSION
    192.168.1.15    Ready    <none>   3h28m   v1.33.5-r20-33.0.4.9
    192.168.1.158   Ready    <none>   42m     v1.33.5-r20-33.0.4.9
    192.168.1.168   Ready    <none>   4m7s    v1.33.5-r20-33.0.4.9
    $ kubectl get pod -l app=cpu-burst
    NAME                         READY   STATUS    RESTARTS   AGE
    cpu-burst-5d84f5647c-697j2   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-9r9rs   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-9swmq   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-bqrmw   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-jc7f9   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-pshpx   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-qkzfm   1/1     Running   0          9m36s
    cpu-burst-5d84f5647c-rl7mk   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-tnt92   1/1     Running   0          9m35s
    cpu-burst-5d84f5647c-w728l   1/1     Running   0          9m35s

  4. Scale in the Deployment.

    kubectl scale deployment cpu-burst --replicas=5

    After the Deployment is scaled in, new small-core nodes are created first, and then the old nodes are reclaimed.

    $ kubectl get node
    NAME            STATUS   ROLES    AGE     VERSION
    192.168.1.15    Ready    <none>   3h30m   v1.33.5-r20-33.0.4.9
    192.168.1.168   Ready    <none>   6m25s   v1.33.5-r20-33.0.4.9
    192.168.1.71    Ready    <none>   2m14s   v1.33.5-r20-33.0.4.9

Uninstalling the Release

  1. Log in to the CCE console and click the target cluster name. In the left navigation pane, choose App Templates.
  2. On the Releases tab page, locate the row that contains the installed release and choose More > Uninstall in the Operation column.

Configuration Parameters

  • NodePool resource parameters: For details, see NodePools.
  • CCENodeClass resource parameters:
    Table 1 CCENodeClass resource parameters

    Parameter

    Mandatory

    Type

    Description

    subnetSelectorTerms

    Yes

    SubnetSelectorTerm object

    Definition

    Node subnet information

    Constraints

    N/A

    ecsGroupId

    No

    String

    Definition

    ECS group ID. If this parameter is specified, nodes will be created in the specified ECS group.

    Constraints

    N/A

    Range

    N/A

    Default Value

    imsSelector

    Yes

    IMSSelector object

    Definition

    Node OS and image

    Constraints

    N/A

    blockDeviceMappings

    Yes

    BlockDeviceMappings Object

    Definition

    Node disk device

    Constraints

    N/A

    login

    Yes

    Login Object

    Definition

    Node login mode

    Constraints

    N/A

    runtimeConfiguration

    No

    RuntimeConfiguration Object

    Definition

    Runtime configuration

    Constraints

    N/A

    Table 2 SubnetSelectorTerm

    Parameter

    Mandatory

    Type

    Description

    id

    Yes

    String

    Definition

    Network ID of the subnet that the network interface belongs to

    Constraints

    N/A

    Range

    Log in to the VPC console. In the left navigation pane, choose Virtual Private Cloud > Subnets. Click the target subnet name and copy the Network ID on the Summary tab page.

    Default Value

    N/A

    Table 3 IMSSelector

    Parameter

    Mandatory

    Type

    Description

    imsFamily

    Yes

    String

    Definition

    Node OS

    Constraints

    N/A

    Range

    N/A

    Default Value

    N/A

    Table 4 BlockDeviceMappings

    Parameter

    Mandatory

    Type

    Description

    root

    Yes

    BlockDevice Object

    Definition

    System disk

    Constraints

    N/A

    k8s

    Yes

    BlockDevice Object

    Definition

    Data disk used by runtime and Kubernetes

    Constraints

    N/A

    users

    No

    Array of BlockDevice Objects

    Definition

    User data volume

    Constraints

    N/A

    Table 5 BlockDevice

    Parameter

    Mandatory

    Type

    Description

    volumeSize

    Yes

    Int

    Definition

    Disk size, in GiB

    Constraints

    N/A

    Range

    • Value range of the root volume: 20 to 1024
    • Value range of the Kubernetes volume: 20 to 32768
    • Value range of the user volume: 10 to 32768

    Default Value

    N/A

    volumeType

    Yes

    String

    Definition

    Disk type

    Constraints

    N/A

    Range

    • SAS: High I/O disk
    • SSD: Ultra-high I/O disk
    • SATA: Common I/O disk SATA disks are no longer available from EVS. Only existing nodes have this type of disks.
    • ESSD: Extreme SSD disk
    • GPSSD: General Purpose SSD disk
    • ESSD2: Extreme SSD V2 disk
    • GPSSD2: General Purpose SSD V2 disk

    Default Value

    N/A

    Table 6 Login

    Parameter

    Mandatory

    Type

    Description

    userPassword

    Yes

    UserPassword Object

    Definition

    Node login mode

    Constraints

    N/A

    Table 7 UserPassword

    Parameter

    Mandatory

    Type

    Description

    username

    No

    String

    Definition

    Node login username

    Constraints

    N/A

    Range

    N/A

    Default Value

    N/A

    password

    Yes

    String

    Definition

    Login password. If a username and password are used when a node is created, this field is shielded in the response body.

    Constraints

    The password field must be encrypted using a unique salt per credential during node creation. For details, see Adding a Salt in the password Field When Creating a Node.

    Range

    A password must meet the following requirements:

    • It must contain 8 to 26 characters.
    • Contains at least three of the following character types: uppercase letters, lowercase letters, digits, and special characters !@$%^-_=+[{}]:,./?
    • Cannot contain the username or the username spelled backwards.

    Default Value

    N/A

    Table 8 RuntimeConfiguration

    Parameter

    Mandatory

    Type

    Description

    type

    No

    String

    Definition

    Container runtime

    Constraints

    N/A

    Range

    N/A

    Default Value

    Default container runtime:

    • In clusters earlier than v1.25, the default value is docker.
    • In clusters of v1.25 or later, the default value varies by OS.

      Nodes running EulerOS 2.5 or EulerOS 2.8 only support docker. For nodes running other OSs, the default value is containerd.