Help Center> Cloud Container Engine> Best Practices> Auto Scaling> Using HPA and CA for Auto Scaling of Workloads and Nodes

Using HPA and CA for Auto Scaling of Workloads and Nodes

Issues

The best way to handle surging traffic is to automatically adjust the number of machines based on the traffic volume or resource usage, which is called scaling.

In CCE, the resources that can be used by containers are fixed during application deployment. Therefore, in auto scaling, pods are scaled first. The node resource usage increases only after the number of pods increases. Then, nodes can be scaled based on the node resource usage. How to configure auto scaling in CCE?

Solution

Two major auto scaling policies are HPA (Horizontal Pod Autoscaling) and CA (Cluster AutoScaling). HPA is for workload auto scaling and CA is for node auto scaling.

HPA and CA work with each other. HPA requires sufficient cluster resources for successful scaling. When the cluster resources are insufficient, CA is needed to add nodes. If HPA reduces workloads, the cluster will have a large number of idle resources. In this case, CA needs to release nodes to avoid resource waste.

As shown in Figure 1, HPA performs scale-out based on the monitoring metrics. When cluster resources are insufficient, newly created pods are in Pending state. CA then checks these pending pods and selects the most appropriate node pool based on the configured scaling policy to scale out the node pool. For details about how HPA and CA work, see Workload Scaling Mechanisms and Node Scaling Mechanisms.
Figure 1 HPA and CA working flows

CCE allows you to create a CA policy to scale out nodes periodically or based on the CPU or memory allocation rate. The CA policy can work with the autoscaler add-on to scales out nodes based on the pending status of pods.

Using HPA and CA can easily implement auto scaling in most scenarios. In addition, the scaling process of nodes and pods can be easily observed.

This section uses an example to describe the auto scaling process using HPA and CA policies together.

Preparations

  • Prepare a compute-intensive application. When a user sends a request, the result is returned to the user only after computing. The following is a PHP page. The code is written in the index.php file. When a user sends a request, the root extraction is performed for 1,000,000 times, and then "OK!" is returned.
    <?php
      $x = 0.0001;
      for ($i = 0; $i <= 1000000; $i++) {
        $x += sqrt($x);
      }
      echo "OK!";
    ?>

    Compile a Dockerfile to create an image.

    FROM php:5-apache
    COPY index.php /var/www/html/index.php
    RUN chmod a+rx index.php

    Run the following command to build an image named hpa-example.

    docker build -t hpa-example:latest .

    After the build is complete, upload the image to SWR. For details, see Uploading an Image Through a Container Engine Client.

  • Create a cluster with one node with 2 cores of CPU and 4 GB of memory. Assign an EIP for the node to allow external access.
  • Install add-ons for the cluster.

    cce-hpa-controller: HPA add-on

    autoscaler: CA add-on

    prometheus: an open-source system monitoring and alarm framework that can collect multiple metrics

    metrics-server: an aggregator of resource usage data in a Kubernetes cluster. It can collect measurement data of major Kubernetes resources, such as pods, nodes, containers, and Services.

Creating a Node Pool and a CA Policy

Create a node pool, add a node with 2 vCPUs and 4 GB memory, and enable auto scaling.

Modify the autoscaler add-on configuration, enable auto scale-in, and configure scale-in parameters. For example, when the node resource usage is less than 50%, the scale-in is triggered.

After the preceding configurations, scale-out is performed based on the pending status of the pod and scale-in is triggered when the node resource usage decreases.

When you create a CA policy on CCE, scale-out can be triggered periodically or based on the CPU or memory allocation rate. The CA policy can work with the autoscaler add-on to scales out nodes based on the pending status of pods. As shown in the following figure, when the cluster CPU allocation rate is greater than 70%, one node will be added. A CA policy needs to be associated with a node pool. Multiple node pools can be associated. When you need to scale nodes, node with proper specifications will be added or reduced from the node pool based on the minimum waste principle.

Creating a Workload

Use the hpa-example image to create a Deployment with one replica. The image path is related to the organization uploaded to the SWR repository and needs to be replaced with the actual value.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: hpa-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hpa-example
  template:
    metadata:
      labels:
        app: hpa-example
    spec:
      containers:
      - name: container-1
        image: 'swr.cn-east-3.myhuaweicloud.com/group/hpa-example:latest'
        resources:
          limits:                  # The value of limits must be the same as that of requests to prevent flapping during scaling.
            cpu: 500m
            memory: 200Mi
          requests:
            cpu: 500m
            memory: 200Mi
      imagePullSecrets:
      - name: default-secret

Then, create a NodePort Service for the workload so that the workload can be accessed from external networks.

To allow external access to NodePort Services, you need to purchase an EIP for the node in the cluster. After purchasing the EIP, you need to synchronize node data. For details, see Synchronizing Node Data. If the node already has an EIP, you do not need to buy one.

Alternatively, you can create a Service with an ELB load balancer for external access. For details, see Using kubectl to Create a Service (Automatically Creating a Shared Load Balancer).

kind: Service
apiVersion: v1
metadata:
  name: hpa-example
spec:
  ports:
    - name: cce-service-0
      protocol: TCP
      port: 80
      targetPort: 80
      nodePort: 31144
  selector:
    app: hpa-example
  type: NodePort

Creating an HPA Policy

Create an HPA policy. As shown below, the policy is associated with the hpa-example workload, and the target CPU usage is 50%.

There are two other annotations. One annotation defines the CPU thresholds, indicating that scaling is not performed when the CPU usage is between 30% and 70% to prevent impact caused by slight fluctuation. The other is the scaling time window, indicating that after the policy is successfully executed, a scaling operation will not be triggered again in this cooling interval to prevent impact caused by short-term fluctuation.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-policy
  annotations:
    extendedhpa.metrics: '[{"type":"Resource","name":"cpu","targetType":"Utilization","targetRange":{"low":"30","high":"70"}}]'
    extendedhpa.option: '{"downscaleWindow":"5m","upscaleWindow":"3m"}'
spec:
  scaleTargetRef:
    kind: Deployment
    name: hpa-example
    apiVersion: apps/v1
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: 50

Set the parameters as follows if you are using the console.

Observing the Auto Scaling Process

  1. Check the cluster node status. In the following example, there are two nodes.

    # kubectl get node
    NAME            STATUS   ROLES    AGE     VERSION
    192.168.0.183   Ready    <none>   2m20s   v1.17.9-r0-CCE21.1.1.3.B001-17.36.8
    192.168.0.26    Ready    <none>   55m     v1.17.9-r0-CCE21.1.1.3.B001-17.36.8

    Check the HPA policy. The CPU usage of the target workload is 0%.

    # kubectl get hpa hpa-policy
    NAME         REFERENCE                TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    hpa-policy   Deployment/hpa-example   0%/50%    1         100       1          4m

  2. Run the following command to access the workload. In the following command, {ip:port} indicates the access address of the workload, which can be queried on the workload details page.

    while true;do wget -q -O- http://{ip:port}; done

    If no EIP is displayed, the cluster node has not been assigned any EIP. You need to buy one, bind it to the node, and synchronize node data. For details, see Synchronizing Node Data.

    Observe the scaling process of the workload.

    # kubectl get hpa hpa-policy --watch
    NAME         REFERENCE                TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       1          4m
    hpa-policy   Deployment/hpa-example   190%/50%   1         100       1          4m23s
    hpa-policy   Deployment/hpa-example   190%/50%   1         100       4          4m31s
    hpa-policy   Deployment/hpa-example   200%/50%   1         100       4          5m16s
    hpa-policy   Deployment/hpa-example   200%/50%   1         100       4          6m16s
    hpa-policy   Deployment/hpa-example   85%/50%    1         100       4          7m16s
    hpa-policy   Deployment/hpa-example   81%/50%    1         100       4          8m16s
    hpa-policy   Deployment/hpa-example   81%/50%    1         100       7          8m31s
    hpa-policy   Deployment/hpa-example   57%/50%    1         100       7          9m16s
    hpa-policy   Deployment/hpa-example   51%/50%    1         100       7          10m
    hpa-policy   Deployment/hpa-example   58%/50%    1         100       7          11m

    You can see that the CPU usage of the workload is 190% at 4m23s, which exceeds the target value. In this case, scaling is triggered to expand the workload to four replicas/pods. In the subsequent several minutes, the CPU usage does not decrease until 7m16s. This is because the new pods may not be successfully created. The possible cause is that resources are insufficient and the pods are in Pending state. During this period, nodes are added.

    At 7m16s, the CPU usage decreases, indicating that the pods are successfully created and start to bear traffic. The CPU usage decreases to 81% at 8m, still greater than the target value (50%) and the high threshold (70%). Therefore, 7 pods are added at 9m16s, and the CPU usage decreases to 51%, which is within the range of 30% to 70%. From then on, the number of pods remains 7.

    In the following output, you can see the workload scaling process and the time when the HPA policy takes effect.

    # kubectl describe deploy hpa-example
    ...
    Events:
      Type    Reason             Age    From                   Message
      ----    ------             ----   ----                   -------
      Normal  ScalingReplicaSet  25m    deployment-controller  Scaled up replica set hpa-example-79dd795485 to 1
      Normal  ScalingReplicaSet  20m    deployment-controller  Scaled up replica set hpa-example-79dd795485 to 4
      Normal  ScalingReplicaSet  16m    deployment-controller  Scaled up replica set hpa-example-79dd795485 to 7
    # kubectl describe hpa hpa-policy
    ...
    Events:
      Type    Reason             Age    From                       Message
      ----    ------             ----   ----                       -------
      Normal  SuccessfulRescale  20m    horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
      Normal  SuccessfulRescale  16m    horizontal-pod-autoscaler  New size: 7; reason: cpu resource utilization (percentage of request) above target

    Check the number of nodes. The following output shows that two nodes are added.

    # kubectl get node
    NAME            STATUS   ROLES    AGE     VERSION
    192.168.0.120   Ready    <none>   3m5s    v1.17.9-r0-CCE21.1.1.3.B001-17.36.8
    192.168.0.136   Ready    <none>   6m58s   v1.17.9-r0-CCE21.1.1.3.B001-17.36.8
    192.168.0.183   Ready    <none>   18m     v1.17.9-r0-CCE21.1.1.3.B001-17.36.8
    192.168.0.26    Ready    <none>   71m     v1.17.9-r0-CCE21.1.1.3.B001-17.36.8

    You can also view the scaling history on the console. For example, the CA policy is executed once when the CPU allocation rate in the cluster is greater than 70%, and the number of nodes in the node pool is increased from 2 to 3. The new node is automatically added by autoscaler based on the pending state of pods in the initial phase of HPA.

    The node scaling process is as follows:

    1. After the number of pods changes to 4, the pods are in Pending state due to insufficient resources. As a result, the default scale-out policy of the autoscaler add-on is triggered, and the number of nodes is increased by one.
    2. The second node scale-out is triggered because the CPU allocation rate in the cluster is greater than 70%. As a result, the number of nodes is increased by one, which is recorded in the scaling history on the console. Scaling based on the allocation rate ensures that the cluster has sufficient resources.

  3. Stop accessing the workload and check the number of pods.

    # kubectl get hpa hpa-policy --watch
    NAME         REFERENCE                TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
    hpa-policy   Deployment/hpa-example   50%/50%    1         100       7          12m
    hpa-policy   Deployment/hpa-example   21%/50%    1         100       7          13m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       7          14m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       7          18m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          18m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          19m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          19m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          19m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          19m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          23m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       3          23m
    hpa-policy   Deployment/hpa-example   0%/50%     1         100       1          23m

    You can see that the CPU usage is 21% at 13m. The number of pods is reduced to 3 at 18m, and then reduced to 1 at 23m.

    In the following output, you can see the workload scaling process and the time when the HPA policy takes effect.

    # kubectl describe deploy hpa-example
    ...
    Events:
      Type    Reason             Age    From                   Message
      ----    ------             ----   ----                   -------
      Normal  ScalingReplicaSet  25m    deployment-controller  Scaled up replica set hpa-example-79dd795485 to 1
      Normal  ScalingReplicaSet  20m    deployment-controller  Scaled up replica set hpa-example-79dd795485 to 4
      Normal  ScalingReplicaSet  16m    deployment-controller  Scaled up replica set hpa-example-79dd795485 to 7
      Normal  ScalingReplicaSet  6m28s  deployment-controller  Scaled down replica set hpa-example-79dd795485 to 3
      Normal  ScalingReplicaSet  72s    deployment-controller  Scaled down replica set hpa-example-79dd795485 to 1
    # kubectl describe hpa hpa-policy
    ...
    Events:
      Type    Reason             Age    From                       Message
      ----    ------             ----   ----                       -------
      Normal  SuccessfulRescale  20m    horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
      Normal  SuccessfulRescale  16m    horizontal-pod-autoscaler  New size: 7; reason: cpu resource utilization (percentage of request) above target
      Normal  SuccessfulRescale  6m45s  horizontal-pod-autoscaler  New size: 3; reason: All metrics below target
      Normal  SuccessfulRescale  90s    horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

    You can also view the HPA policy execution history on the console.

    Wait until the one node is reduced.

    The reason why the other two nodes in the node pool are not reduced is that they both have pods in the kube-system namespace (not pods created by DaemonSets). For details about node scale-in, see How autoscaler Works.

Summary

Using HPA and CA can easily implement auto scaling in most scenarios. In addition, the scaling process of nodes and pods can be easily observed.