Application Scaling Priority Policies
With application scaling priority policies, you can manage resources more efficiently by customizing the scaling order of pods across different node types. If the default scaling priority policy is applied, pods will be scheduled first to yearly/monthly nodes during scale-out, followed by pay-per-use nodes and virtual-kubelet nodes (scaling pods to CCI). During scale-in, pods are deleted sequentially from virtual-kubelet nodes (scaling pods to CCI), pay-per-use nodes, and yearly/monthly nodes.
- Scale-out: Volcano schedules new pods in a cluster based on preset node priority for scale-out.
- Scale-in: When a workload is specified, Volcano scores the workload based on preset node priority to determine pod deletion sequence during scale-in.
Notes and Constraints
- The cluster version must be 1.23.11 or later, 1.25.6 or later, or 1.27.3 or later.
- The Volcano Scheduler add-on (1.12.1 or later) must be installed in a cluster, and the application scaling priority policy function must be enabled.
- By default, the scaling priority takes effect for Deployments (including ReplicaSet). To make the scaling priority take effect on third-party workloads, you can adjust the advanced settings. For details, see Configuring a Scaling Priority Policy for a Third-Party Workload.
- To use the scale-out scheduling priority policies, you need to set spec.schedulerName of a workload to volcano or set the default cluster scheduler to volcano. The application scaling priority policy function applies not to workloads with no resource limit and requested resources configured.
- If the default priority policy is used, Volcano Scheduler schedules workloads based on the priorities of yearly/monthly nodes, pay-per-use nodes, and virtual-kubelet nodes (scaling pods to CCI). However, the priorities cannot be fully implemented, because Volcano Scheduler takes scheduling results into account from multiple dimensions rather than just one.
- Volcano Scheduler must balance scheduling performance with scheduling results. When there are a large number of schedulable nodes in a cluster, it selects only some of them for scheduling to ensure scheduling performance and will not find the best global scheduling solution. For details, see Scheduler Performance Tuning. This behavior conflicts with the scaling priority policies. But you can make Volcano Scheduler select all nodes for scheduling by adjusting the proportion of nodes that can be scheduled by Volcano Scheduler.
Overview
After the application scaling priority policy is enabled, the Balancer and BalancerPolicyTemplate CRDs are added to a cluster, and the default scaling priority policy is created. For details, see Applying the Default Application Scaling Priority Policy. Volcano Scheduler obtains the priority of each node based on the BalancerPolicyTemplate CR to control the pod scheduling priority during application scale-out. In addition, it configures the priority during application scale-in based on both Balancer and BalancerPolicyTemplate CRs.
- The BalancerPolicyTemplate CRDs are used to define priority policies. For example, in the default scaling priority policy, the BalancerPolicyTemplate CR assigns the highest priority to yearly/monthly nodes, followed by pay-per-use nodes, and the lowest priority to virtual-kubelet nodes (scaling pods to CCI) by default.
The BalancerPolicyTemplate CRs cannot be updated.
- The Balancer CRDs are used to declare the application scope of scaling priorities. When creating a Balancer CR, you can specify a workload in a namespace, a specific Deployment, or a specific ReplicaSet as the application scope.
A Balancer CR corresponds to a BalancerPolicyTemplate CR. They work together to determine which priority policies are applied to specific workloads.
In Volcano Scheduler's default scaling priority policy, the BalancerPolicyTemplate CR classifies yearly/monthly nodes, pay-per-use nodes, and virtual-kubelet nodes (scaling pods to CCI) into different priorities. Volcano Scheduler takes these priorities into account during scale-out and preferentially schedules new pods to the yearly/monthly nodes with higher priorities.
Volcano Scheduler applies annotations to pods within the application scope specified by the Balancer CR based on the priorities set by the BalancerPolicyTemplate CR. It may add the following annotations to a pod that meets the conditions:
- openvessel.io/workload-balancer-score: indicates a pod's score, which is higher if the pod is on a high-priority node.
- autoscaling.volcano.sh/dominated-by-balancer: specifies the Balancer CR that controls the current pod. Pods with low scores are preferentially scaled in.
If the existing pods already have the community supported controller.kubernetes.io/pod-deletion-cost annotation added, scale-in will be performed based on the priority defined by this annotation. If two pods have the same value for this annotation, the openvessel.io/workload-balancer-score annotation will be used to determine which pod to scale-in.
You can configure the workload_balancer_score_annotation_key parameter in advanced settings to specify the annotation key for storing pod scores. For details, see Configuring a Scaling Priority Policy for a Third-Party Workload.
Configuring an Application Scaling Priority Policy
- Install Volcano Scheduler in a cluster and enable the application scaling priority policy. The default scaling priority policy will be created in the cluster.
- Obtain a default Balancer CR.
# kubectl get balancer default-balancer -oyaml apiVersion: autoscaling.volcano.sh/v1alpha1 kind: Balancer metadata: name: default-balancer spec: balancerPolicyTemplateName: default-balancerpolicytemplate targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: Exists weight: 10
- Obtain a default BalancerPolicyTemplate CR.
# kubectl get balancerpolicytemplate default-balancerpolicytemplate -oyaml apiVersion: autoscaling.volcano.sh/v1alpha1 kind: BalancerPolicyTemplate metadata: name: default-balancerpolicytemplate spec: policy: policyName: Priority priorities: priorityGroups: - priority: 10 requirements: - key: node.cce.io/billing-mode operator: In values: - post-paid - priority: 100 requirements: - key: node.cce.io/billing-mode operator: In values: - pre-paid - priority: 1 requirements: - key: kubernetes.io/role operator: In values: - virtual-kubelet - bursting
For details about the parameters, see Applying the Default Application Scaling Priority Policy.
- Obtain a default Balancer CR.
- Deploy a workload and set the number of pods to 1.
Pods of the current workload are preferentially scheduled to yearly/monthly nodes.
apiVersion: apps/v1 kind: Deployment metadata: name: balancer-test namespace: default labels: virtual-kubelet.io/burst-to-cci: 'auto' #If the resources of a cluster are not enough, pods in this cluster can be deployed on CCI. spec: replicas: 1 selector: matchLabels: app: balancer-test template: metadata: labels: app: balancer-test spec: containers: - image: nginx:latest imagePullPolicy: IfNotPresent name: container-1 resources: limits: cpu: 250m memory: 512Mi requests: cpu: 250m memory: 512Mi schedulerName: volcano
- Increase the number of workload pods to 5.
Pods of the current workload are preferentially scheduled to yearly/monthly nodes. If there are not enough yearly/monthly nodes, these pods will be preferentially scheduled to pay-per-use nodes. If there are not enough pay-per-use nodes, these pods will be scheduled to virtual-kubelet nodes (scaling pods to CCI).
apiVersion: apps/v1 kind: Deployment metadata: name: balancer-test namespace: default labels: virtual-kubelet.io/burst-to-cci: 'auto' #If the resources of a cluster are not enough, pods in this cluster can be deployed on CCI. spec: replicas: 5 selector: matchLabels: app: balancer-test template: metadata: labels: app: balancer-test spec: containers: - image: nginx:latest imagePullPolicy: IfNotPresent name: container-1 resources: limits: cpu: 250m memory: 512Mi requests: cpu: 250m memory: 512Mi schedulerName: volcano
- View the scores of pods.
- Pods on a yearly/monthly node:
apiVersion: v1 kind: Pod metadata: annotations: autoscaling.volcano.sh/dominated-by-balancer: default-balancer #The Balancer CR named default-balancer controls the scaling priority of the current pods. openvessel.io/workload-balancer-score: "100" #Priority of the current yearly/monthly node, which also indicates the pods' score ... nodeName: 192.168.20.100 #A yearly/monthly node
- Pods on a pay-per-use node:
apiVersion: v1 kind: Pod metadata: annotations: autoscaling.volcano.sh/dominated-by-balancer: default-balancer #The Balancer CR named default-balancer controls the scaling priority of the current pods. openvessel.io/workload-balancer-score "10" #Priority of the current pay-per-use node, which also indicates the pods' score ... nodeName: 192.168.20.196 #A pay-per-use node
- Pods on a virtual-kubelet node (scaling pods to CCI):
apiVersion: v1 kind: Pod metadata: annotations: autoscaling.volcano.sh/dominated-by-balancer: default-balancer #The Balancer CR named default-balancer controls the scaling priority of the current pods. openvessel.io/workload-balancer-score: "1" #Priority of the current virtual-kubelet node, which also indicates the pods' score ... nodeName: virtual-kubelet #A virtual-kubelet node
- Pods on a yearly/monthly node:
- Gradually reduce the number of the workload pods.
Pods on virtual-kubelet nodes (scaling pods to CCI) are deleted first, followed by pods on pay-per-use nodes and those on yearly/monthly nodes.
Applying the Default Application Scaling Priority Policy
When the default application scaling priority policy is used, the following default CRs are present in a cluster:
- A Balancer CR:
apiVersion: autoscaling.volcano.sh/v1alpha1 kind: Balancer metadata: name: default-balancer spec: balancerPolicyTemplateName: default-balancerpolicytemplate targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: Exists weight: 10
Table 1 Key parameters of a Balancer CR Field
Description
Type
Remarks
metadata.name
Name
String
This field is mandatory.
spec. balancerPolicyTemplateName
Name of the priority policy
String
This field is mandatory. The value is the name of the corresponding BalancerPolicyTemplate CR in the cluster.
spec.targets
Application scope of the priority policy
Slice
This field is mandatory. Example:
- Applying to applications in the default namespace:
spec: targets: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: default
- Applying to applications in multiple namespaces like default, other, and another:
spec: targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: In values: - default - other - another
- Applying to applications in all namespaces:
spec: targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: Exists
- Only applying to Deployments which are named in the format of xxx-xxx-xxx:
spec: targets: - objectSelectors: - name: xxx-xxx-xxx kind: Deployment
- Only applying to Deployments which are named in the format of xxx-xxx-xxx and are in the default namespace:
spec: targets: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: default objectSelectors: - name: xxx-xxx-xxx kind: Deployment
spec.weight
Weight of the priority policy
int32
This field is mandatory. When there are multiple Balancer CRs in a cluster, an application may fall within the scope of more than one of them. In such cases, the Balancer CR with the highest weight will be applied.
- Applying to applications in the default namespace:
- A BalancerPolicyTemplate CR:
apiVersion: autoscaling.volcano.sh/v1alpha1 kind: BalancerPolicyTemplate metadata: name: default-balancerpolicytemplate spec: policy: policyName: Priority priorities: priorityGroups: - priority: 10 requirements: - key: node.cce.io/billing-mode operator: In values: - post-paid - priority: 100 requirements: - key: node.cce.io/billing-mode operator: In values: - pre-paid - priority: 1 requirements: - key: kubernetes.io/role operator: In values: - virtual-kubelet - bursting
Table 2 Key parameters of a BalancerPolicyTemplate CR Field
Description
Type
Remarks
metadata.name
Name
String
This field is mandatory.
spec.policy
Content of the priority policy
Struct
This field is mandatory.
spec.policy.policyname
Name of the priority policy
String
This field is mandatory. Only the priority policy named Priority is supported.
spec.policy.priorities. priorityGroups
Specific priority defined in the priority policy
Slice
This field is mandatory. Example:
- Setting the priority of a yearly/monthly node to 100:
priorityGroups: - priority: 100 requirements: - key: node.cce.io/billing-mode operator: In values: - pre-paid
- Setting the priority of a pay-per-use node to 10:
priorityGroups: - priority: 10 requirements: - key: node.cce.io/billing-mode operator: In values: - post-paid
- Setting the priority of a virtual-kubelet or bursting node to 1:
priorityGroups: - priority: 1 requirements: - key: kubernetes.io/role operator: In values: - virtual-kubelet - bursting
- Setting the priority of a yearly/monthly node to 100:
Customizing an Application Scaling Priority Policy
The BalancerPolicyTemplate CRDs are used to define priority policies. If you need to customize an application scaling priority policy, you need to modify the BalancerPolicyTemplate CR.
If there are multiple BalancerPolicyTemplate CRs in a cluster, they will all affect the scaling result. Therefore, if the default scaling priority policy is not in use, run the following command to delete it:
kubectl delete balancerpolicytemplate default-balancerpolicytemplate
Assume that during scale-out, a workload is preferentially scheduled to a node running HCE 2.0 and then to a node running Euler. During scale-in, Volcano Scheduler first deletes the workload pods on the node running Euler and then deletes the pods on the node running HCE 2.0.
- Write a new BalancerPolicyTemplate CR.
vim new-balancerpolicytemplate.yaml
The content is as follows:apiVersion: autoscaling.volcano.sh/v1alpha1 kind: BalancerPolicyTemplate metadata: name: new-balancerpolicytemplate spec: policy: policyName: Priority priorities: priorityGroups: - priority: 10 # Set the priority of the node running EulerOS to 10. requirements: - key: os.name # Label of the Node OS operator: In values: - EulerOS_2.0_SP9x86_64 # The minor version number of the OS may be involved. You can add the minor version number as needed. - priority: 100 # Set the priority of the node running HCE 2.0 to 100. requirements: - key: os.name # Label of the Node OS operator: In values: - Huawei_Cloud_EulerOS_2.0_x86_64
- Create a new BalancerPolicyTemplate CR.
kubectl create -f new-balancerpolicytemplate.yaml
- Modify default-balancer. You can also create a new Balancer CR as needed.
kubectl edit balancer default-balancer
The modified content is as follows:apiVersion: autoscaling.volcano.sh/v1alpha1 kind: Balancer metadata: name: default-balancer spec: balancerPolicyTemplateName: new-balancerpolicytemplate targets: - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: Exists weight: 10
- Check whether the value of openvessel.io/workload-balancer-score in each pod meets the expectation.
The value of openvessel.io/workload-balancer-score in each pod on the node running EulerOS is set to 10. The value of openvessel.io/workload-balancer-score in each pod on the node running HCE 2.0 is set to 100.
Configuring a Scaling Priority Policy for a Third-Party Workload
For a workload that is not a Deployment but is managed by CRDs, you can configure the scaling priority policies for the workload in the Advanced Settings area, so that Volcano can support the scaling priority policies of the workload.
- Log in to the CCE console and click the cluster name to access the cluster console.
- Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Settings and click the Scheduling tab. In the Select Cluster Scheduler area, select Volcano scheduler, find the expert mode, and click Refresh.
- In the navigation pane, choose Add-ons, locate Volcano Scheduler, click Install or Edit, and adjust the configuration parameters in the Parameters area.
- Specify the type of the third-party workload to be supported. The following is an example in JSON format:
{ "default_scheduler_conf": { ... }, "workload_balancer_score_annotation_key": "", "workload_balancer_third_party_types": "apps.kruise.io/v1alpha1/clonesets,apps.kruise.io/v1beta1/statefulsets" }
- workload_balancer_score_annotation_key: specifies the score annotation key of a pod. openvessel.io/workload-balancer-score or controller.kubernetes.io/pod-deletion-cost is supported. Setting this parameter to other values will cause volcano to exit abnormally.
-
workload_balancer_third_party_types: The value is a character string consisting of the group, version, and kind of a third-party workload, and CRDs are separated by commas (,).
The value represented the workload kind needs to be in a plural form, for example, apps.kruise.io/v1alpha1/clonesets,apps.kruise.io/v1beta1/statefulsets. If it is in a non-plural form, for example, apps.kruise.io/v1alpha1/cloneset, the corresponding CRD cannot be monitored.
If the format is incorrect, volcano will exit abnormally. If the specified CRD is not present in the cluster, the application scaling priority policy cannot work properly.
If the CRD is set to scale in according to priority, the controller overseeing it can identify the pod score annotation during the scaling process and adjust the sequence accordingly.
Appendix: Adjusting the Proportion of Nodes That Can Be Scheduled by Volcano Scheduler
kubectl edit deploy volcano-scheduler -nkube-system
The content is as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: volcano-scheduler
app.kubernetes.io/managed-by: Helm
release: cceaddon-volcano
name: volcano-scheduler
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: volcano-scheduler
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: volcano-scheduler
release: cceaddon-volcano
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- volcano-scheduler
topologyKey: topology.kubernetes.io/zone
weight: 100
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- volcano-scheduler
topologyKey: kubernetes.io/hostname
containers:
- command:
- /bin/sh
- -c
- /volcano-scheduler --leader-elect=true --lock-object-namespace=kube-system
--feature-gates=CSIMigrationFlexVolumeFuxi=true,CSIMigrationFlexVolumeFuxiComplete=true,MultiGPUScheduling=true
--kube-api-qps=200 --alsologtostderr --listen-address=$(MY_POD_IP):8080
--enable-healthz=true --healthz-address=$(MY_POD_IP):11251 --enable-metrics=true --percentage-nodes-to-find=100
--scheduler-conf=/volcano.scheduler/default-scheduler.conf -v=3 1>>/var/log/volcano/volcano-scheduler.log
--percentage-nodes-to-find=100 specifies that Volcano Scheduler can find all nodes in a cluster during scheduling selection.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot