Help Center/ Cloud Container Engine/ User Guide/ Scheduling/ Volcano Scheduling/ Application Scaling Priority Policies
Updated on 2024-09-30 GMT+08:00

Application Scaling Priority Policies

With application scaling priority policies, you can manage resources more efficiently by customizing the scaling order of pods across different node types. If the default scaling priority policy is applied, pods will be scheduled first to yearly/monthly nodes during scale-out, followed by pay-per-use nodes and virtual-kubelet nodes (scaling pods to CCI). During scale-in, pods are deleted sequentially from virtual-kubelet nodes (scaling pods to CCI), pay-per-use nodes, and yearly/monthly nodes.

The application scaling priority policy includes the following two aspects:
  • Scale-out: Volcano schedules new pods in a cluster based on preset node priority for scale-out.
  • Scale-in: When a workload is specified, Volcano scores the workload based on preset node priority to determine pod deletion sequence during scale-in.

Notes and Constraints

  • The cluster version must be 1.23.11 or later, 1.25.6 or later, or 1.27.3 or later.
  • The Volcano Scheduler add-on (1.12.1 or later) must be installed in a cluster, and the application scaling priority policy function must be enabled.
  • By default, the scaling priority takes effect for Deployments (including ReplicaSet). To make the scaling priority take effect on third-party workloads, you can adjust the advanced settings. For details, see Configuring a Scaling Priority Policy for a Third-Party Workload.
  • To use the scale-out scheduling priority policies, you need to set spec.schedulerName of a workload to volcano or set the default cluster scheduler to volcano. The application scaling priority policy function applies not to workloads with no resource limit and requested resources configured.
  • If the default priority policy is used, Volcano Scheduler schedules workloads based on the priorities of yearly/monthly nodes, pay-per-use nodes, and virtual-kubelet nodes (scaling pods to CCI). However, the priorities cannot be fully implemented, because Volcano Scheduler takes scheduling results into account from multiple dimensions rather than just one.
  • Volcano Scheduler must balance scheduling performance with scheduling results. When there are a large number of schedulable nodes in a cluster, it selects only some of them for scheduling to ensure scheduling performance and will not find the best global scheduling solution. For details, see Scheduler Performance Tuning. This behavior conflicts with the scaling priority policies. But you can make Volcano Scheduler select all nodes for scheduling by adjusting the proportion of nodes that can be scheduled by Volcano Scheduler.

Overview

After the application scaling priority policy is enabled, the Balancer and BalancerPolicyTemplate CRDs are added to a cluster, and the default scaling priority policy is created. For details, see Applying the Default Application Scaling Priority Policy. Volcano Scheduler obtains the priority of each node based on the BalancerPolicyTemplate CR to control the pod scheduling priority during application scale-out. In addition, it configures the priority during application scale-in based on both Balancer and BalancerPolicyTemplate CRs.

  • The BalancerPolicyTemplate CRDs are used to define priority policies. For example, in the default scaling priority policy, the BalancerPolicyTemplate CR assigns the highest priority to yearly/monthly nodes, followed by pay-per-use nodes, and the lowest priority to virtual-kubelet nodes (scaling pods to CCI) by default.

    The BalancerPolicyTemplate CRs cannot be updated.

  • The Balancer CRDs are used to declare the application scope of scaling priorities. When creating a Balancer CR, you can specify a workload in a namespace, a specific Deployment, or a specific ReplicaSet as the application scope.

A Balancer CR corresponds to a BalancerPolicyTemplate CR. They work together to determine which priority policies are applied to specific workloads.

In Volcano Scheduler's default scaling priority policy, the BalancerPolicyTemplate CR classifies yearly/monthly nodes, pay-per-use nodes, and virtual-kubelet nodes (scaling pods to CCI) into different priorities. Volcano Scheduler takes these priorities into account during scale-out and preferentially schedules new pods to the yearly/monthly nodes with higher priorities.

Volcano Scheduler applies annotations to pods within the application scope specified by the Balancer CR based on the priorities set by the BalancerPolicyTemplate CR. It may add the following annotations to a pod that meets the conditions:

  • openvessel.io/workload-balancer-score: indicates a pod's score, which is higher if the pod is on a high-priority node.
  • autoscaling.volcano.sh/dominated-by-balancer: specifies the Balancer CR that controls the current pod. Pods with low scores are preferentially scaled in.

If the existing pods already have the community supported controller.kubernetes.io/pod-deletion-cost annotation added, scale-in will be performed based on the priority defined by this annotation. If two pods have the same value for this annotation, the openvessel.io/workload-balancer-score annotation will be used to determine which pod to scale-in.

You can configure the workload_balancer_score_annotation_key parameter in advanced settings to specify the annotation key for storing pod scores. For details, see Configuring a Scaling Priority Policy for a Third-Party Workload.

Configuring an Application Scaling Priority Policy

  1. Install Volcano Scheduler in a cluster and enable the application scaling priority policy. The default scaling priority policy will be created in the cluster.

    1. Obtain a default Balancer CR.
      # kubectl get balancer default-balancer -oyaml
      
      apiVersion: autoscaling.volcano.sh/v1alpha1
      kind: Balancer
      metadata:
        name: default-balancer
      spec:
        balancerPolicyTemplateName: default-balancerpolicytemplate
        targets:
        - namespaceSelector:
            matchExpressions:
              - key: kubernetes.io/metadata.name
                operator: Exists
        weight: 10 
    2. Obtain a default BalancerPolicyTemplate CR.
      # kubectl get balancerpolicytemplate default-balancerpolicytemplate -oyaml
      
      apiVersion: autoscaling.volcano.sh/v1alpha1
      kind: BalancerPolicyTemplate
      metadata:
        name: default-balancerpolicytemplate
      spec:
        policy:
          policyName: Priority
          priorities:
            priorityGroups:
            - priority: 10
              requirements:
              - key: node.cce.io/billing-mode
                operator: In
                values:
                - post-paid
            - priority: 100
              requirements:
              - key: node.cce.io/billing-mode
                operator: In
                values:
                - pre-paid
            - priority: 1
              requirements:
              - key: kubernetes.io/role
                operator: In
                values:
                - virtual-kubelet
                - bursting

    For details about the parameters, see Applying the Default Application Scaling Priority Policy.

  2. Deploy a workload and set the number of pods to 1.

    Pods of the current workload are preferentially scheduled to yearly/monthly nodes.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: balancer-test
      namespace: default
      labels:
        virtual-kubelet.io/burst-to-cci: 'auto'  #If the resources of a cluster are not enough, pods in this cluster can be deployed on CCI.
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: balancer-test
      template:
        metadata:
          labels:
            app: balancer-test
        spec:
          containers:
          - image: nginx:latest
            imagePullPolicy: IfNotPresent
            name: container-1
            resources:
              limits:
                cpu: 250m
                memory: 512Mi
              requests:
                cpu: 250m
                memory: 512Mi
          schedulerName: volcano

  3. Increase the number of workload pods to 5.

    Pods of the current workload are preferentially scheduled to yearly/monthly nodes. If there are not enough yearly/monthly nodes, these pods will be preferentially scheduled to pay-per-use nodes. If there are not enough pay-per-use nodes, these pods will be scheduled to virtual-kubelet nodes (scaling pods to CCI).

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: balancer-test
      namespace: default
      labels:
        virtual-kubelet.io/burst-to-cci: 'auto'  #If the resources of a cluster are not enough, pods in this cluster can be deployed on CCI.
    spec:
      replicas: 5
      selector:
        matchLabels:
          app: balancer-test
      template:
        metadata:
          labels:
            app: balancer-test
        spec:
          containers:
          - image: nginx:latest
            imagePullPolicy: IfNotPresent
            name: container-1
            resources:
              limits:
                cpu: 250m
                memory: 512Mi
              requests:
                cpu: 250m
                memory: 512Mi
          schedulerName: volcano

  4. View the scores of pods.

    1. Pods on a yearly/monthly node:
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          autoscaling.volcano.sh/dominated-by-balancer: default-balancer  #The Balancer CR named default-balancer controls the scaling priority of the current pods.
          openvessel.io/workload-balancer-score: "100" #Priority of the current yearly/monthly node, which also indicates the pods' score
      ...
        nodeName: 192.168.20.100 #A yearly/monthly node
    2. Pods on a pay-per-use node:
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          autoscaling.volcano.sh/dominated-by-balancer: default-balancer  #The Balancer CR named default-balancer controls the scaling priority of the current pods.
          openvessel.io/workload-balancer-score "10"  #Priority of the current pay-per-use node, which also indicates the pods' score
          ...
        nodeName: 192.168.20.196 #A pay-per-use node
    3. Pods on a virtual-kubelet node (scaling pods to CCI):
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          autoscaling.volcano.sh/dominated-by-balancer: default-balancer  #The Balancer CR named default-balancer controls the scaling priority of the current pods.
          openvessel.io/workload-balancer-score: "1"  #Priority of the current virtual-kubelet node, which also indicates the pods' score
          ...
        nodeName: virtual-kubelet #A virtual-kubelet node

  5. Gradually reduce the number of the workload pods.

    Pods on virtual-kubelet nodes (scaling pods to CCI) are deleted first, followed by pods on pay-per-use nodes and those on yearly/monthly nodes.

Applying the Default Application Scaling Priority Policy

When the default application scaling priority policy is used, the following default CRs are present in a cluster:

  • A Balancer CR:
    apiVersion: autoscaling.volcano.sh/v1alpha1
    kind: Balancer
    metadata:
      name: default-balancer
    spec:
      balancerPolicyTemplateName: default-balancerpolicytemplate
      targets:
      - namespaceSelector:
          matchExpressions:
            - key: kubernetes.io/metadata.name
              operator: Exists
      weight: 10 
    Table 1 Key parameters of a Balancer CR

    Field

    Description

    Type

    Remarks

    metadata.name

    Name

    String

    This field is mandatory.

    spec. balancerPolicyTemplateName

    Name of the priority policy

    String

    This field is mandatory. The value is the name of the corresponding BalancerPolicyTemplate CR in the cluster.

    spec.targets

    Application scope of the priority policy

    Slice

    This field is mandatory. Example:

    • Applying to applications in the default namespace:
      spec:
        targets:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: default
    • Applying to applications in multiple namespaces like default, other, and another:
      spec:
        targets:
        - namespaceSelector:
            matchExpressions:
              - key: kubernetes.io/metadata.name
                operator: In
                values:
                - default
                - other
                - another
    • Applying to applications in all namespaces:
      spec:
        targets:
        - namespaceSelector:
            matchExpressions:
              - key: kubernetes.io/metadata.name
                operator: Exists
    • Only applying to Deployments which are named in the format of xxx-xxx-xxx:
      spec:
        targets:
        - objectSelectors:
            - name: xxx-xxx-xxx
              kind: Deployment
    • Only applying to Deployments which are named in the format of xxx-xxx-xxx and are in the default namespace:
      spec:
        targets:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: default
          objectSelectors:
            - name: xxx-xxx-xxx
              kind: Deployment

    spec.weight

    Weight of the priority policy

    int32

    This field is mandatory. When there are multiple Balancer CRs in a cluster, an application may fall within the scope of more than one of them. In such cases, the Balancer CR with the highest weight will be applied.

  • A BalancerPolicyTemplate CR:
    apiVersion: autoscaling.volcano.sh/v1alpha1
    kind: BalancerPolicyTemplate
    metadata:
      name: default-balancerpolicytemplate
    spec:
      policy:
        policyName: Priority
        priorities:
          priorityGroups:
          - priority: 10
            requirements:
            - key: node.cce.io/billing-mode
              operator: In
              values:
              - post-paid
          - priority: 100
            requirements:
            - key: node.cce.io/billing-mode
              operator: In
              values:
              - pre-paid
          - priority: 1
            requirements:
            - key: kubernetes.io/role
              operator: In
              values:
              - virtual-kubelet
              - bursting
    Table 2 Key parameters of a BalancerPolicyTemplate CR

    Field

    Description

    Type

    Remarks

    metadata.name

    Name

    String

    This field is mandatory.

    spec.policy

    Content of the priority policy

    Struct

    This field is mandatory.

    spec.policy.policyname

    Name of the priority policy

    String

    This field is mandatory. Only the priority policy named Priority is supported.

    spec.policy.priorities. priorityGroups

    Specific priority defined in the priority policy

    Slice

    This field is mandatory. Example:

    • Setting the priority of a yearly/monthly node to 100:
            priorityGroups:
            - priority: 100
              requirements:
              - key: node.cce.io/billing-mode
                operator: In
                values:
                - pre-paid
    • Setting the priority of a pay-per-use node to 10:
            priorityGroups:
            - priority: 10
              requirements:
              - key: node.cce.io/billing-mode
                operator: In
                values:
                - post-paid
    • Setting the priority of a virtual-kubelet or bursting node to 1:
            priorityGroups:
            - priority: 1
              requirements:
              - key: kubernetes.io/role
                operator: In
                values:
                - virtual-kubelet
                - bursting

Customizing an Application Scaling Priority Policy

The BalancerPolicyTemplate CRDs are used to define priority policies. If you need to customize an application scaling priority policy, you need to modify the BalancerPolicyTemplate CR.

If there are multiple BalancerPolicyTemplate CRs in a cluster, they will all affect the scaling result. Therefore, if the default scaling priority policy is not in use, run the following command to delete it:

kubectl delete balancerpolicytemplate default-balancerpolicytemplate

Assume that during scale-out, a workload is preferentially scheduled to a node running HCE 2.0 and then to a node running Euler. During scale-in, Volcano Scheduler first deletes the workload pods on the node running Euler and then deletes the pods on the node running HCE 2.0.

  1. Write a new BalancerPolicyTemplate CR.

    vim new-balancerpolicytemplate.yaml
    The content is as follows:
    apiVersion: autoscaling.volcano.sh/v1alpha1
    kind: BalancerPolicyTemplate
    metadata:
      name: new-balancerpolicytemplate
    spec:
      policy:
        policyName: Priority
        priorities:
          priorityGroups:
          - priority: 10    # Set the priority of the node running EulerOS to 10.
            requirements:
            - key: os.name  # Label of the Node OS
              operator: In
              values:
              - EulerOS_2.0_SP9x86_64  # The minor version number of the OS may be involved. You can add the minor version number as needed.
          - priority: 100   # Set the priority of the node running HCE 2.0 to 100.
            requirements:
            - key: os.name  # Label of the Node OS
              operator: In
              values:
              - Huawei_Cloud_EulerOS_2.0_x86_64

  2. Create a new BalancerPolicyTemplate CR.

    kubectl create -f new-balancerpolicytemplate.yaml

  3. Modify default-balancer. You can also create a new Balancer CR as needed.

    kubectl edit balancer default-balancer
    The modified content is as follows:
    apiVersion: autoscaling.volcano.sh/v1alpha1
    kind: Balancer
    metadata:
      name: default-balancer
    spec:
      balancerPolicyTemplateName: new-balancerpolicytemplate
      targets:
      - namespaceSelector:
          matchExpressions:
            - key: kubernetes.io/metadata.name
              operator: Exists
      weight: 10 

  4. Check whether the value of openvessel.io/workload-balancer-score in each pod meets the expectation.

    The value of openvessel.io/workload-balancer-score in each pod on the node running EulerOS is set to 10. The value of openvessel.io/workload-balancer-score in each pod on the node running HCE 2.0 is set to 100.

Configuring a Scaling Priority Policy for a Third-Party Workload

For a workload that is not a Deployment but is managed by CRDs, you can configure the scaling priority policies for the workload in the Advanced Settings area, so that Volcano can support the scaling priority policies of the workload.

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Settings and click the Scheduling tab. In the Select Cluster Scheduler area, select Volcano scheduler, find the expert mode, and click Refresh.

  3. In the navigation pane, choose Add-ons, locate Volcano Scheduler, click Install or Edit, and adjust the configuration parameters in the Parameters area.
  4. Specify the type of the third-party workload to be supported. The following is an example in JSON format:

    {
        "default_scheduler_conf": {
    ...
        },
        "workload_balancer_score_annotation_key": "",
        "workload_balancer_third_party_types": "apps.kruise.io/v1alpha1/clonesets,apps.kruise.io/v1beta1/statefulsets"
    }
    • workload_balancer_score_annotation_key: specifies the score annotation key of a pod. openvessel.io/workload-balancer-score or controller.kubernetes.io/pod-deletion-cost is supported. Setting this parameter to other values will cause volcano to exit abnormally.
    • workload_balancer_third_party_types: The value is a character string consisting of the group, version, and kind of a third-party workload, and CRDs are separated by commas (,).

      The value represented the workload kind needs to be in a plural form, for example, apps.kruise.io/v1alpha1/clonesets,apps.kruise.io/v1beta1/statefulsets. If it is in a non-plural form, for example, apps.kruise.io/v1alpha1/cloneset, the corresponding CRD cannot be monitored.

      If the format is incorrect, volcano will exit abnormally. If the specified CRD is not present in the cluster, the application scaling priority policy cannot work properly.

    If the CRD is set to scale in according to priority, the controller overseeing it can identify the pod score annotation during the scaling process and adjust the sequence accordingly.

Appendix: Adjusting the Proportion of Nodes That Can Be Scheduled by Volcano Scheduler

Write a volcano-scheduler resource object.
kubectl edit deploy volcano-scheduler -nkube-system

The content is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: volcano-scheduler
    app.kubernetes.io/managed-by: Helm
    release: cceaddon-volcano
  name: volcano-scheduler
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: volcano-scheduler
  strategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: volcano-scheduler
        release: cceaddon-volcano
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - volcano-scheduler
              topologyKey: topology.kubernetes.io/zone
            weight: 100
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - volcano-scheduler
            topologyKey: kubernetes.io/hostname
      containers:
      - command:
        - /bin/sh
        - -c
        - /volcano-scheduler --leader-elect=true --lock-object-namespace=kube-system
          --feature-gates=CSIMigrationFlexVolumeFuxi=true,CSIMigrationFlexVolumeFuxiComplete=true,MultiGPUScheduling=true
          --kube-api-qps=200 --alsologtostderr --listen-address=$(MY_POD_IP):8080
          --enable-healthz=true --healthz-address=$(MY_POD_IP):11251 --enable-metrics=true --percentage-nodes-to-find=100
          --scheduler-conf=/volcano.scheduler/default-scheduler.conf -v=3 1>>/var/log/volcano/volcano-scheduler.log

--percentage-nodes-to-find=100 specifies that Volcano Scheduler can find all nodes in a cluster during scheduling selection.