Help Center/ Cloud Container Engine/ User Guide/ Scheduling/ Volcano Scheduling/ Application Scaling Priority Policies

Updated on 2024-09-30 GMT+08:00

View PDF

Application Scaling Priority Policies

With application scaling priority policies, you can manage resources more efficiently by customizing the scaling order of pods across different node types. If the default scaling priority policy is applied, pods will be scheduled first to yearly/monthly nodes during scale-out, followed by pay-per-use nodes and virtual-kubelet nodes (scaling pods to CCI). During scale-in, pods are deleted sequentially from virtual-kubelet nodes (scaling pods to CCI), pay-per-use nodes, and yearly/monthly nodes.

The application scaling priority policy includes the following two aspects:

Scale-out: Volcano schedules new pods in a cluster based on preset node priority for scale-out.
Scale-in: When a workload is specified, Volcano scores the workload based on preset node priority to determine pod deletion sequence during scale-in.

Notes and Constraints

The cluster version must be 1.23.11 or later, 1.25.6 or later, or 1.27.3 or later.
The Volcano Scheduler add-on (1.12.1 or later) must be installed in a cluster, and the application scaling priority policy function must be enabled.
By default, the scaling priority takes effect for Deployments (including ReplicaSet). To make the scaling priority take effect on third-party workloads, you can adjust the advanced settings. For details, see Configuring a Scaling Priority Policy for a Third-Party Workload.
To use the scale-out scheduling priority policies, you need to set spec.schedulerName of a workload to volcano or set the default cluster scheduler to volcano. The application scaling priority policy function applies not to workloads with no resource limit and requested resources configured.
If the default priority policy is used, Volcano Scheduler schedules workloads based on the priorities of yearly/monthly nodes, pay-per-use nodes, and virtual-kubelet nodes (scaling pods to CCI). However, the priorities cannot be fully implemented, because Volcano Scheduler takes scheduling results into account from multiple dimensions rather than just one.
Volcano Scheduler must balance scheduling performance with scheduling results. When there are a large number of schedulable nodes in a cluster, it selects only some of them for scheduling to ensure scheduling performance and will not find the best global scheduling solution. For details, see Scheduler Performance Tuning. This behavior conflicts with the scaling priority policies. But you can make Volcano Scheduler select all nodes for scheduling by adjusting the proportion of nodes that can be scheduled by Volcano Scheduler.

Overview

After the application scaling priority policy is enabled, the Balancer and BalancerPolicyTemplate CRDs are added to a cluster, and the default scaling priority policy is created. For details, see Applying the Default Application Scaling Priority Policy. Volcano Scheduler obtains the priority of each node based on the BalancerPolicyTemplate CR to control the pod scheduling priority during application scale-out. In addition, it configures the priority during application scale-in based on both Balancer and BalancerPolicyTemplate CRs.

The BalancerPolicyTemplate CRDs are used to define priority policies. For example, in the default scaling priority policy, the BalancerPolicyTemplate CR assigns the highest priority to yearly/monthly nodes, followed by pay-per-use nodes, and the lowest priority to virtual-kubelet nodes (scaling pods to CCI) by default.
The BalancerPolicyTemplate CRs cannot be updated.
The Balancer CRDs are used to declare the application scope of scaling priorities. When creating a Balancer CR, you can specify a workload in a namespace, a specific Deployment, or a specific ReplicaSet as the application scope.

A Balancer CR corresponds to a BalancerPolicyTemplate CR. They work together to determine which priority policies are applied to specific workloads.

In Volcano Scheduler's default scaling priority policy, the BalancerPolicyTemplate CR classifies yearly/monthly nodes, pay-per-use nodes, and virtual-kubelet nodes (scaling pods to CCI) into different priorities. Volcano Scheduler takes these priorities into account during scale-out and preferentially schedules new pods to the yearly/monthly nodes with higher priorities.

Volcano Scheduler applies annotations to pods within the application scope specified by the Balancer CR based on the priorities set by the BalancerPolicyTemplate CR. It may add the following annotations to a pod that meets the conditions:

openvessel.io/workload-balancer-score: indicates a pod's score, which is higher if the pod is on a high-priority node.
autoscaling.volcano.sh/dominated-by-balancer: specifies the Balancer CR that controls the current pod. Pods with low scores are preferentially scaled in.

If the existing pods already have the community supported controller.kubernetes.io/pod-deletion-cost annotation added, scale-in will be performed based on the priority defined by this annotation. If two pods have the same value for this annotation, the openvessel.io/workload-balancer-score annotation will be used to determine which pod to scale-in.

You can configure the workload_balancer_score_annotation_key parameter in advanced settings to specify the annotation key for storing pod scores. For details, see Configuring a Scaling Priority Policy for a Third-Party Workload.

Configuring an Application Scaling Priority Policy

Install Volcano Scheduler in a cluster and enable the application scaling priority policy. The default scaling priority policy will be created in the cluster.

Obtain a default Balancer CR.

# kubectl get balancer default-balancer -oyaml

apiVersion: autoscaling.volcano.sh/v1alpha1
kind: Balancer
metadata:
  name: default-balancer
spec:
  balancerPolicyTemplateName: default-balancerpolicytemplate
  targets:
  - namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: Exists
  weight: 10

Obtain a default BalancerPolicyTemplate CR.

# kubectl get balancerpolicytemplate default-balancerpolicytemplate -oyaml

apiVersion: autoscaling.volcano.sh/v1alpha1
kind: BalancerPolicyTemplate
metadata:
  name: default-balancerpolicytemplate
spec:
  policy:
    policyName: Priority
    priorities:
      priorityGroups:
      - priority: 10
        requirements:
        - key: node.cce.io/billing-mode
          operator: In
          values:
          - post-paid
      - priority: 100
        requirements:
        - key: node.cce.io/billing-mode
          operator: In
          values:
          - pre-paid
      - priority: 1
        requirements:
        - key: kubernetes.io/role
          operator: In
          values:
          - virtual-kubelet
          - bursting

For details about the parameters, see Applying the Default Application Scaling Priority Policy.

Deploy a workload and set the number of pods to 1.

Pods of the current workload are preferentially scheduled to yearly/monthly nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: balancer-test
  namespace: default
  labels:
    virtual-kubelet.io/burst-to-cci: 'auto'  #If the resources of a cluster are not enough, pods in this cluster can be deployed on CCI.
spec:
  replicas: 1
  selector:
    matchLabels:
      app: balancer-test
  template:
    metadata:
      labels:
        app: balancer-test
    spec:
      containers:
      - image: nginx:latest
        imagePullPolicy: IfNotPresent
        name: container-1
        resources:
          limits:
            cpu: 250m
            memory: 512Mi
          requests:
            cpu: 250m
            memory: 512Mi
      schedulerName: volcano

Increase the number of workload pods to 5.

Pods of the current workload are preferentially scheduled to yearly/monthly nodes. If there are not enough yearly/monthly nodes, these pods will be preferentially scheduled to pay-per-use nodes. If there are not enough pay-per-use nodes, these pods will be scheduled to virtual-kubelet nodes (scaling pods to CCI).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: balancer-test
  namespace: default
  labels:
    virtual-kubelet.io/burst-to-cci: 'auto'  #If the resources of a cluster are not enough, pods in this cluster can be deployed on CCI.
spec:
  replicas: 5
  selector:
    matchLabels:
      app: balancer-test
  template:
    metadata:
      labels:
        app: balancer-test
    spec:
      containers:
      - image: nginx:latest
        imagePullPolicy: IfNotPresent
        name: container-1
        resources:
          limits:
            cpu: 250m
            memory: 512Mi
          requests:
            cpu: 250m
            memory: 512Mi
      schedulerName: volcano

View the scores of pods.

Pods on a yearly/monthly node:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    autoscaling.volcano.sh/dominated-by-balancer: default-balancer  #The Balancer CR named default-balancer controls the scaling priority of the current pods.
    openvessel.io/workload-balancer-score: "100" #Priority of the current yearly/monthly node, which also indicates the pods' score
...
  nodeName: 192.168.20.100 #A yearly/monthly node

Pods on a pay-per-use node:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    autoscaling.volcano.sh/dominated-by-balancer: default-balancer  #The Balancer CR named default-balancer controls the scaling priority of the current pods.
    openvessel.io/workload-balancer-score "10"  #Priority of the current pay-per-use node, which also indicates the pods' score
    ...
  nodeName: 192.168.20.196 #A pay-per-use node

Pods on a virtual-kubelet node (scaling pods to CCI):

apiVersion: v1
kind: Pod
metadata:
  annotations:
    autoscaling.volcano.sh/dominated-by-balancer: default-balancer  #The Balancer CR named default-balancer controls the scaling priority of the current pods.
    openvessel.io/workload-balancer-score: "1"  #Priority of the current virtual-kubelet node, which also indicates the pods' score
    ...
  nodeName: virtual-kubelet #A virtual-kubelet node

Gradually reduce the number of the workload pods.

Pods on virtual-kubelet nodes (scaling pods to CCI) are deleted first, followed by pods on pay-per-use nodes and those on yearly/monthly nodes.

Applying the Default Application Scaling Priority Policy

When the default application scaling priority policy is used, the following default CRs are present in a cluster:

A Balancer CR:

apiVersion: autoscaling.volcano.sh/v1alpha1
kind: Balancer
metadata:
  name: default-balancer
spec:
  balancerPolicyTemplateName: default-balancerpolicytemplate
  targets:
  - namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: Exists
  weight: 10

**Table 1** Key parameters of a Balancer CR
Field	Description	Type	Remarks
metadata.name	Name	String	This field is mandatory.
spec. balancerPolicyTemplateName	Name of the priority policy	String	This field is mandatory. The value is the name of the corresponding BalancerPolicyTemplate CR in the cluster.
spec.targets	Application scope of the priority policy	Slice	This field is mandatory. Example: Applying to applications in the default namespace: spec: targets: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: default Applying to applications in multiple namespaces like default, other, and another: spec: targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - default - other - another Applying to applications in all namespaces: spec: targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: Exists Only applying to Deployments which are named in the format of xxx-xxx-xxx: spec: targets: - objectSelectors: - name: xxx-xxx-xxx kind: Deployment Only applying to Deployments which are named in the format of xxx-xxx-xxx and are in the default namespace: spec: targets: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: default objectSelectors: - name: xxx-xxx-xxx kind: Deployment
spec.weight	Weight of the priority policy	int32	This field is mandatory. When there are multiple Balancer CRs in a cluster, an application may fall within the scope of more than one of them. In such cases, the Balancer CR with the highest weight will be applied.

A BalancerPolicyTemplate CR:

apiVersion: autoscaling.volcano.sh/v1alpha1
kind: BalancerPolicyTemplate
metadata:
  name: default-balancerpolicytemplate
spec:
  policy:
    policyName: Priority
    priorities:
      priorityGroups:
      - priority: 10
        requirements:
        - key: node.cce.io/billing-mode
          operator: In
          values:
          - post-paid
      - priority: 100
        requirements:
        - key: node.cce.io/billing-mode
          operator: In
          values:
          - pre-paid
      - priority: 1
        requirements:
        - key: kubernetes.io/role
          operator: In
          values:
          - virtual-kubelet
          - bursting

**Table 2** Key parameters of a BalancerPolicyTemplate CR
Field	Description	Type	Remarks
metadata.name	Name	String	This field is mandatory.
spec.policy	Content of the priority policy	Struct	This field is mandatory.
spec.policy.policyname	Name of the priority policy	String	This field is mandatory. Only the priority policy named Priority is supported.
spec.policy.priorities. priorityGroups	Specific priority defined in the priority policy	Slice	This field is mandatory. Example: Setting the priority of a yearly/monthly node to 100: priorityGroups: - priority: 100 requirements: - key: node.cce.io/billing-mode operator: In values: - pre-paid Setting the priority of a pay-per-use node to 10: priorityGroups: - priority: 10 requirements: - key: node.cce.io/billing-mode operator: In values: - post-paid Setting the priority of a virtual-kubelet or bursting node to 1: priorityGroups: - priority: 1 requirements: - key: kubernetes.io/role operator: In values: - virtual-kubelet - bursting

Customizing an Application Scaling Priority Policy

The BalancerPolicyTemplate CRDs are used to define priority policies. If you need to customize an application scaling priority policy, you need to modify the BalancerPolicyTemplate CR.

If there are multiple BalancerPolicyTemplate CRs in a cluster, they will all affect the scaling result. Therefore, if the default scaling priority policy is not in use, run the following command to delete it:

kubectl delete balancerpolicytemplate default-balancerpolicytemplate

Assume that during scale-out, a workload is preferentially scheduled to a node running HCE 2.0 and then to a node running Euler. During scale-in, Volcano Scheduler first deletes the workload pods on the node running Euler and then deletes the pods on the node running HCE 2.0.

Write a new BalancerPolicyTemplate CR.

vim new-balancerpolicytemplate.yaml

The content is as follows:

apiVersion: autoscaling.volcano.sh/v1alpha1
kind: BalancerPolicyTemplate
metadata:
  name: new-balancerpolicytemplate
spec:
  policy:
    policyName: Priority
    priorities:
      priorityGroups:
      - priority: 10    # Set the priority of the node running EulerOS to 10.
        requirements:
        - key: os.name  # Label of the Node OS
          operator: In
          values:
          - EulerOS_2.0_SP9x86_64  # The minor version number of the OS may be involved. You can add the minor version number as needed.
      - priority: 100   # Set the priority of the node running HCE 2.0 to 100.
        requirements:
        - key: os.name  # Label of the Node OS
          operator: In
          values:
          - Huawei_Cloud_EulerOS_2.0_x86_64

Create a new BalancerPolicyTemplate CR.

kubectl create -f new-balancerpolicytemplate.yaml

Modify default-balancer. You can also create a new Balancer CR as needed.

kubectl edit balancer default-balancer

The modified content is as follows:

apiVersion: autoscaling.volcano.sh/v1alpha1
kind: Balancer
metadata:
  name: default-balancer
spec:
  balancerPolicyTemplateName: new-balancerpolicytemplate
  targets:
  - namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: Exists
  weight: 10

Check whether the value of openvessel.io/workload-balancer-score in each pod meets the expectation.

The value of openvessel.io/workload-balancer-score in each pod on the node running EulerOS is set to 10. The value of openvessel.io/workload-balancer-score in each pod on the node running HCE 2.0 is set to 100.

Configuring a Scaling Priority Policy for a Third-Party Workload

For a workload that is not a Deployment but is managed by CRDs, you can configure the scaling priority policies for the workload in the Advanced Settings area, so that Volcano can support the scaling priority policies of the workload.

Log in to the CCE console and click the cluster name to access the cluster console.
Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Settings and click the Scheduling tab. In the Select Cluster Scheduler area, select Volcano scheduler, find the expert mode, and click Refresh.
In the navigation pane, choose Add-ons, locate Volcano Scheduler, click Install or Edit, and adjust the configuration parameters in the Parameters area.
Specify the type of the third-party workload to be supported. The following is an example in JSON format:
```
{
    "default_scheduler_conf": {
...
    },
    "workload_balancer_score_annotation_key": "",
    "workload_balancer_third_party_types": "apps.kruise.io/v1alpha1/clonesets,apps.kruise.io/v1beta1/statefulsets"
}
```
- workload_balancer_score_annotation_key: specifies the score annotation key of a pod. openvessel.io/workload-balancer-score or controller.kubernetes.io/pod-deletion-cost is supported. Setting this parameter to other values will cause volcano to exit abnormally.
- workload_balancer_third_party_types: The value is a character string consisting of the group, version, and kind of a third-party workload, and CRDs are separated by commas (,).
  
  The value represented the workload kind needs to be in a plural form, for example, apps.kruise.io/v1alpha1/clonesets,apps.kruise.io/v1beta1/statefulsets. If it is in a non-plural form, for example, apps.kruise.io/v1alpha1/cloneset, the corresponding CRD cannot be monitored.
  
  If the format is incorrect, volcano will exit abnormally. If the specified CRD is not present in the cluster, the application scaling priority policy cannot work properly.
If the CRD is set to scale in according to priority, the controller overseeing it can identify the pod score annotation during the scaling process and adjust the sequence accordingly.

Appendix: Adjusting the Proportion of Nodes That Can Be Scheduled by Volcano Scheduler

Write a volcano-scheduler resource object.

kubectl edit deploy volcano-scheduler -nkube-system

The content is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: volcano-scheduler
    app.kubernetes.io/managed-by: Helm
    release: cceaddon-volcano
  name: volcano-scheduler
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: volcano-scheduler
  strategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: volcano-scheduler
        release: cceaddon-volcano
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - volcano-scheduler
              topologyKey: topology.kubernetes.io/zone
            weight: 100
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - volcano-scheduler
            topologyKey: kubernetes.io/hostname
      containers:
      - command:
        - /bin/sh
        - -c
        - /volcano-scheduler --leader-elect=true --lock-object-namespace=kube-system
          --feature-gates=CSIMigrationFlexVolumeFuxi=true,CSIMigrationFlexVolumeFuxiComplete=true,MultiGPUScheduling=true
          --kube-api-qps=200 --alsologtostderr --listen-address=$(MY_POD_IP):8080
          --enable-healthz=true --healthz-address=$(MY_POD_IP):11251 --enable-metrics=true --percentage-nodes-to-find=100
          --scheduler-conf=/volcano.scheduler/default-scheduler.conf -v=3 1>>/var/log/volcano/volcano-scheduler.log

--percentage-nodes-to-find=100 specifies that Volcano Scheduler can find all nodes in a cluster during scheduling selection.

Parent Topic: Volcano Scheduling

Previous topic: NUMA Affinity Scheduling

Next topic: Cloud Native Hybrid Deployment