Help Center/ Cloud Container Engine/ Best Practices/ Auto Scaling/ Using Karpenter for Auto Scaling of Nodes

Updated on 2026-06-17 GMT+08:00

Using Karpenter for Auto Scaling of Nodes

Karpenter is a dynamic, high-performance, open-source cluster auto scaling solution for Kubernetes. It aims to use the right number of nodes at the right time, simplifying Kubernetes infrastructure management. Compared with Cluster Autoscaler, Karpenter reduces the resource scaling time from minutes to seconds, significantly improving workload efficiency in clusters and lowering costs.

Karpenter:

Monitors pods marked as unschedulable. Pods may become unschedulable due to reasons, such as insufficient CPU or memory resources, selector conditions not met, mismatched node taints and tolerations, or occupied host ports.
Evaluates the scheduling requirements of the unschedulable pods.
Provisions new nodes that meet the requirements of those pods.
Deletes nodes when they are no longer needed, for example, when nodes are idle or resources expire.

Core Features and Advantages

Event-driven, rapid scale-out: Traditional Cluster Autoscaler (CA) relies on periodic polling and cloud auto-scaling groups, which typically requires three to five minutes to add new nodes. Karpenter abandons this model and instead listens continuously for pod scheduling events within the cluster, achieving millisecond-level response times. It evaluates the resource requirements of pending pods directly and calls cloud provider APIs without intermediate layers. This enables rapid provisioning of the most suitable compute instances to handle traffic spikes and high-concurrency workloads.
Node pool-free architecture: Karpenter bypasses the constraints of traditional fixed node pools. Through declarative NodePool policies, it enables flexible configuration of AZs, instance architectures, and billing models. The system accurately parses pod requirements for CPU, memory, and scheduling constraints such as affinity and tolerations. Combined with FlexusX instances, Karpenter can configure CPU-to-memory ratios on demand to ensure that node specifications precisely match actual service requirements. This prevents resource overcommitment and waste at the source.
Continuous intelligent consolidation: Karpenter emphasizes efficiency management throughout the entire cluster lifecycle, not merely scale-out speed. The system continuously evaluates real-time resource utilization. By enabling consolidation policies such as WhenEmptyOrUnderutilized, Karpenter automatically reclaims completely idle nodes and proactively evicts and migrates pods from underutilized nodes in a controlled manner. By aggregating scattered workloads onto fewer or more cost-effective instances, Karpenter maximizes the reduction of idle costs.
Capacity and cost awareness:
- Karpenter integrates with the cloud provider's billing model. During scale-out, it performs multi-dimensional cost calculations based on real-time instance pricing and intelligently selects the instance combination with the lowest unit cost that meets pod requirements. It also natively supports flexible hybrid deployment of on-demand and spot instances, minimizing overall compute costs.
- Karpenter also maintains real-time awareness of cloud provider capacity. If the preferred instance type is sold out, triggering a resource insufficiency error, Karpenter automatically retries and seamlessly falls back to other available instance types or AZs within milliseconds. This eliminates the scaling bottlenecks caused by single node pool shortages in traditional CA implementations and ensures service continuity.

Prerequisites

A cluster that meets Karpenter's prerequisites is available, with worker nodes provisioned for Karpenter to run on.
There are EIPs bound to nodes for pulling images from the Internet during chart installation.
A Karpenter container requires Internet access. For a standard cluster, the node where Karpenter is located needs an EIP bound. For a Turbo cluster, the Karpenter container needs an EIP bound.

Notes and Constraints

Karpenter requires the corresponding cloud service provider permissions to create or delete nodes. Currently, Karpenter uses Access Key and Secret Key (AK and SK) credentials. In the future, Karpenter will be available as a system add-on in the CCE add-on marketplace and support custom add-on agencies or pod identity authentication.
Both Karpenter and CCE Cluster Autoscaler are node-scaling add-ons. They cannot be installed together, as doing so may cause mutual interference.
Karpenter cannot scale in the nodes where CCE system add-on pods are running.

Deploying Karpenter

Obtain a chart.

Go to the chart page, select a proper version, and download the Helm chart in .tgz format. This section uses the chart of version 0.2.1 as an example. This chart applies to CCE clusters of v1.29 or later. The configuration items in the chart may vary according to the version. The configuration in this section takes effect only for the chart of version 0.2.1.
Upload the chart.
1. Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose App Templates and click Upload Chart in the upper right corner.
2. Click Add, select the chart to be uploaded, and click Upload.

Specify the value.yaml file.

You can create a value.yaml configuration file on the local PC to configure workload installation parameters. During the installation, you only need to import this configuration file for custom installation. Other unspecified parameters will use the default settings.

The settings are as follows:

# Default values for karpenter-provider-huawei
# -- Number of controller replicas
replicaCount: 1
# Controller image configuration
image:
  # -- Controller image repository
  repository: swr.ap-southeast-3.myhuaweicloud.com/huaweiclouddeveloper/cce/karpenter/controller
  # -- Controller image tag
  tag: "0.2.1"
  # -- Image pull policy
  pullPolicy: IfNotPresent
# kube-rbac-proxy sidecar configuration
rbacProxy:
  image:
    # -- kube-rbac-proxy image repository
    repository: quay.io/brancz/kube-rbac-proxy
    # -- kube-rbac-proxy image tag
    tag: "v0.16.0"
    # -- Image pull policy
    pullPolicy: IfNotPresent
# -- Image pull secrets for controller pods
imagePullSecrets: []
# - name: registry-credentials
# -- Name prefix for all resources
namePrefix: "karpenter-provider-huawei-"
serviceAccount:
  # -- Create ServiceAccount
  create: true
  # -- ServiceAccount name
  name: controller-manager
# Controller arguments
controller:
  # -- Metrics port
  metricsPort: 8080
  # -- Health probe port
  healthProbePort: 8081
# Huawei Cloud credentials
# The generated Secret uses HUAWEICLOUD_SDK_AK, HUAWEICLOUD_SDK_SK,
# and HUAWEICLOUD_SDK_REGION_ID from this block, plus
# HUAWEICLOUD_SDK_CCE_CLUSTER_ID from clusterInfo.clusterID.
# If credentials.create=false, the existing Secret should provide the same keys.
credentials:
  # -- Create a Secret for credentials
  create: true
  # -- Secret name for Huawei Cloud credentials
  name: "huawei-credentials"
  # -- Use an existing Secret when credentials.create is false
  existingSecret: ""
  # -- Huawei Cloud access key
  accessKey: "your-access-key"
  # -- Huawei Cloud secret key
  secretKey: "your-secret-key"
  # -- Huawei Cloud region ID
  region: "your-region-id"
clusterInfo:
  # -- Huawei Cloud CCE cluster ID
  clusterID: "your-cluster-id"
  # -- Cluster category: Optional. Enter "eni" for Turbo network types.
  # For other network types (vpc-router or overlay_l2), enter other values.
  category: ""
# -- yangtseEipInfo: Supports user-defined EIPs (Elastic IPs) bound to Karpenter pods.
# This only takes effect when clusterInfo.category is set to "eni".
yangtseEipInfo:
  yangtse.io/pod-with-eip: "true"
# Controller resources
resources:
  limits:
    cpu: "1"
    memory: 512Mi
  requests:
    cpu: 200m
    memory: 256Mi
# -- Pod security context
podSecurityContext:
  runAsNonRoot: true
  seccompProfile:
    type: RuntimeDefault
# -- Container security context for manager
securityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - "ALL"
# -- Node selector for controller pods
nodeSelector: {}
# -- Tolerations for controller pods
tolerations: []
# -- Affinity for controller pods
affinity: {}

If clusterInfo.category is set to eni, an EIP is automatically bound to a Karpenter pod. You are advised to bind an EIP to a pod in a CCE Turbo cluster. You can configure more EIP parameters in yangtseEipInfo. For details about parameter settings, see Configuring an EIP for a Pod in a CCE Turbo Cluster.
Parameters such as clusterInfo.clusterID, credentials.accessKey, credentials.secretKey, and credentials.region are mandatory. Change other parameter values based on service requirements.

Create a release.
1. Log in to the CCE console and click the target cluster name. In the left navigation pane, choose App Templates.
2. Locate the uploaded chart and click Install.
3. Configure Release Name, Namespace, and Select Version.
4. Click Add next to Configuration File, select the YAML file created locally, and click Install.
5. On the Releases tab, view the status of the release.

Performing Verification

Create NodePool and CCENodeClass resources. For details about the resource parameters, see Configuration Parameters.

apiVersion: karpenter.k8s.huawei/v1alpha1
kind: CCENodeClass
metadata:
  name: demo-cce-elastic-cpu
spec:
  # Add one subnet per target AZ if you want this demo NodePool to launch
  # across multiple zones.
  subnetSelectorTerms:
    - id: "30abcb7b-ddeb-4a93-9e83-0b21f59b07a3"
  imsSelector:
    imsFamily: "Huawei Cloud EulerOS 2.0"
  blockDeviceMappings:
    root:
      volumeSize: 40
      volumeType: SSD
    k8s:
      volumeSize: 100
      volumeType: SSD
  runtimeConfiguration:
    type: containerd
  login:
    userPassword:
      username: root
      password: "JDYk*****"
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: demo-cce-elastic-cpu
spec:
  disruption:
    budgets:
      - nodes: "1"
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  limits:
    # Allow enough headroom for the 600-replica GSS validation to scale past 4 x 12c nodes.
    cpu: "100"
    memory: 384Gi
  template:
    metadata:
      labels:
        demo.huawei.com/scenario: cpu-elastic
        demo.huawei.com/nodepool-profile: shared-burst-and-consolidation
    spec:
      nodeClassRef:
        group: karpenter.k8s.huawei
        kind: CCENodeClass
        name: demo-cce-elastic-cpu
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        # Keep zone unconstrained so Karpenter can use every AZ exposed by the
        # selected ECSNodeClass subnets. Re-add a topology.kubernetes.io/zone
        # requirement if you need to pin this demo to specific AZs.
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - c9.large.4
            - c9.xlarge.4
            - c9.2xlarge.4
            - c9.4xlarge.4

Create a Deployment to verify that Karpenter can provision nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-burst
  namespace: default
spec:
  replicas: 0
  selector:
    matchLabels:
      app: cpu-burst
  template:
    metadata:
      labels:
        app: cpu-burst
    spec:
      terminationGracePeriodSeconds: 10
      nodeSelector:
        demo.huawei.com/scenario: cpu-elastic
      containers:
        - name: web
          image: nginx:1.27-alpine
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
              name: http
          resources:
            requests:
              cpu: "1400m"
              memory: "1200Mi"
            limits:
              cpu: "1400m"
              memory: "1200Mi"

Scale out the Deployment.

kubectl scale deployment cpu-burst --replicas=10

After the Deployment is deployed, you can see that the newly started pods cannot be scheduled to the existing nodes. After the new nodes are created, the pods can be properly scheduled to the new nodes.

$ kubectl get node
NAME            STATUS   ROLES    AGE     VERSION
192.168.1.15    Ready    <none>   3h28m   v1.33.5-r20-33.0.4.9
192.168.1.158   Ready    <none>   42m     v1.33.5-r20-33.0.4.9
192.168.1.168   Ready    <none>   4m7s    v1.33.5-r20-33.0.4.9
$ kubectl get pod -l app=cpu-burst
NAME                         READY   STATUS    RESTARTS   AGE
cpu-burst-5d84f5647c-697j2   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-9r9rs   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-9swmq   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-bqrmw   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-jc7f9   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-pshpx   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-qkzfm   1/1     Running   0          9m36s
cpu-burst-5d84f5647c-rl7mk   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-tnt92   1/1     Running   0          9m35s
cpu-burst-5d84f5647c-w728l   1/1     Running   0          9m35s

Scale in the Deployment.

kubectl scale deployment cpu-burst --replicas=5

After the Deployment is scaled in, new small-core nodes are created first, and then the old nodes are reclaimed.

$ kubectl get node
NAME            STATUS   ROLES    AGE     VERSION
192.168.1.15    Ready    <none>   3h30m   v1.33.5-r20-33.0.4.9
192.168.1.168   Ready    <none>   6m25s   v1.33.5-r20-33.0.4.9
192.168.1.71    Ready    <none>   2m14s   v1.33.5-r20-33.0.4.9

Click to enlarge

Uninstalling the Release

Log in to the CCE console and click the target cluster name. In the left navigation pane, choose App Templates.
On the Releases tab page, locate the row that contains the installed release and choose More > Uninstall in the Operation column.

Configuration Parameters

NodePool resource parameters: For details, see NodePools.

CCENodeClass resource parameters:

**Table 1** CCENodeClass resource parameters
Parameter	Mandatory	Type	Description
subnetSelectorTerms	Yes	SubnetSelectorTerm object	Definition Node subnet information Constraints N/A
ecsGroupId	No	String	Definition ECS group ID. If this parameter is specified, nodes will be created in the specified ECS group. Constraints N/A Range N/A Default Value
imsSelector	Yes	IMSSelector object	Definition Node OS and image Constraints N/A
blockDeviceMappings	Yes	BlockDeviceMappings Object	Definition Node disk device Constraints N/A
login	Yes	Login Object	Definition Node login mode Constraints N/A
runtimeConfiguration	No	RuntimeConfiguration Object	Definition Runtime configuration Constraints N/A

**Table 2** SubnetSelectorTerm
Parameter	Mandatory	Type	Description
id	Yes	String	Definition Network ID of the subnet that the network interface belongs to Constraints N/A Range Log in to the VPC console. In the left navigation pane, choose Virtual Private Cloud > Subnets. Click the target subnet name and copy the Network ID on the Summary tab page. Default Value N/A

**Table 3** IMSSelector
Parameter	Mandatory	Type	Description
imsFamily	Yes	String	Definition Node OS Constraints N/A Range N/A Default Value N/A

**Table 4** BlockDeviceMappings
Parameter	Mandatory	Type	Description
root	Yes	BlockDevice Object	Definition System disk Constraints N/A
k8s	Yes	BlockDevice Object	Definition Data disk used by runtime and Kubernetes Constraints N/A
users	No	Array of BlockDevice Objects	Definition User data volume Constraints N/A

**Table 5** BlockDevice
Parameter	Mandatory	Type	Description
volumeSize	Yes	Int	Definition Disk size, in GiB Constraints N/A Range Value range of the root volume: 20 to 1024 Value range of the Kubernetes volume: 20 to 32768 Value range of the user volume: 10 to 32768 Default Value N/A
volumeType	Yes	String	Definition Disk type Constraints N/A Range SAS: High I/O disk SSD: Ultra-high I/O disk SATA: Common I/O disk SATA disks are no longer available from EVS. Only existing nodes have this type of disks. ESSD: Extreme SSD disk GPSSD: General Purpose SSD disk ESSD2: Extreme SSD V2 disk GPSSD2: General Purpose SSD V2 disk Default Value N/A

**Table 6** Login
Parameter	Mandatory	Type	Description
userPassword	Yes	UserPassword Object	Definition Node login mode Constraints N/A

**Table 7** UserPassword
Parameter	Mandatory	Type	Description
username	No	String	Definition Node login username Constraints N/A Range N/A Default Value N/A
password	Yes	String	Definition Login password. If a username and password are used when a node is created, this field is shielded in the response body. Constraints The password field must be encrypted using a unique salt per credential during node creation. For details, see Adding a Salt in the password Field When Creating a Node. Range A password must meet the following requirements: It must contain 8 to 26 characters. Contains at least three of the following character types: uppercase letters, lowercase letters, digits, and special characters !@$%^-_=+[{}]:,./? Cannot contain the username or the username spelled backwards. Default Value N/A

**Table 8** RuntimeConfiguration
Parameter	Mandatory	Type	Description
type	No	String	Definition Container runtime Constraints N/A Range N/A Default Value Default container runtime: In clusters earlier than v1.25, the default value is docker. In clusters of v1.25 or later, the default value varies by OS. Nodes running EulerOS 2.5 or EulerOS 2.8 only support docker. For nodes running other OSs, the default value is containerd.

Parent Topic: Auto Scaling

Previous topic: Scaling Multiple Applications Using Nginx Ingresses

Next topic: Monitoring

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot