Hybrid Deployment of Online and Offline Jobs
Hybrid deployment of online and offline jobs is an OBT feature.
Online and Offline Jobs
Jobs can be classified into online jobs and offline jobs based on whether services are always online.
- Online job: Such jobs run for a long time, with regular traffic surges, tidal resource requests, and high requirements on SLA, such as advertising and e-commerce services.
- Offline jobs: Such jobs run for a short time, have high computing requirements, and can tolerate high latency, such as AI and big data services.
Resource Oversubscription and Hybrid Deployment
Many services see surges in traffic. To ensure performance and stability, resources are often requested at the maximum needed. However, the surges may ebb very shortly and resources, if not released, are wasted in non-peak hours. Especially for online jobs that request a large quantity of resources to ensure SLA, resource utilization can be as low as it gets.
Resource oversubscription is the process of making use of idle requested resources. Oversubscribed resources are suitable for deploying offline jobs, which focus on throughput but have low SLA requirements and can tolerate certain failures.
Hybrid deployment of online and offline jobs in a cluster can better utilize cluster resources.
Oversubscription for Hybrid Deployment
CPU and memory resources can be oversubscribed. The key features are as follows:
- Offline jobs preferentially run on oversubscribed nodes.
If both oversubscribed and non-oversubscribed nodes exist, the former will score higher than the latter and offline jobs are preferentially scheduled to oversubscribed nodes.
- Online jobs can use only non-oversubscribed resources if scheduled to an oversubscribed node.
Offline jobs can use both oversubscribed and non-oversubscribed resources of an oversubscribed node.
- In the same scheduling period, online jobs take precedence over offline jobs.
If both online and offline jobs exist, online jobs are scheduled first. When the node resource usage exceeds the upper limit and the node requests exceed 100%, offline jobs will be evicted.
- CPU/memory isolation is provided by kernels.
CPU isolation: Online jobs can quickly preempt CPU resources of offline jobs and suppress the CPU usage of the offline jobs.
Memory isolation: When system memory resources are used up and OOM Kill is triggered, the kernel evicts offline jobs first.
Notes and Constraints
- Only BMSs in CCE Turbo clusters of v1.21 or later are supported.
- EulerOS 2.10 must be used on nodes.
- The volcano add-on of v1.3.4 or later must be installed on clusters.
- Before enabling the volcano oversubscription add-on, ensure that the overcommit add-on is not enabled.
- Before enabling the volcano oversubscription add-on, ensure that the CPU Manager Policies are not enabled.
- Before enabling the volcano oversubscription add-on, ensure that the Topology Management Policies are not enabled.
Configuring Oversubscription Labels for Scheduling
If the label volcano.sh/oversubscription=true is configured for a node in the cluster, the oversubscription configuration must be added to the volcano add-on. Otherwise, the scheduling of oversold nodes will be abnormal. For details about the related configuration, see Table 1.
Using Hybrid Deployment
- Configure the volcano add-on.
- Use kubectl to connect to the cluster. For details, see Connecting to a Cluster Using kubectl.
- Run the following command to add the oversubscription plug-in to volcano-scheduler-configmap. Ensure that the add-on configuration does not contain the overcommit plug-in. If - name: overcommit exists, delete this configuration.
# kubectl edit cm volcano-scheduler-configmap -n kube-system apiVersion: v1 data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: gang - name: priority - name: conformance - name: oversubscription - plugins: - name: drf - name: predicates - name: proportion - name: nodeorder - name: binpack
- Enable the node oversubscription feature.
A label can be configured to use oversubscribed resources only after the oversubscription feature is enabled for a node. Related nodes can be created only in a node pool. To enable the oversubscription feature, perform the following steps:
- Create a BMS pool.
- Click Configuration next to the node pool name.
- On the Configuration page displayed, set over-subscription-resource under kubelet to true (visible only for BMS node pools) and click OK.

- Set the node oversubscription label.
The volcano.sh/oversubscription label needs to be configured for an oversubscribed node. If this label is set for a node and the value is true, the node is an oversubscribed node. Otherwise, the node is not an oversubscribed node.
# kubectl label node 192.168.0.0 volcano.sh/oversubscription=true
An oversubscribed node also supports the oversubscription thresholds, as listed in Table 2. You can use the kubectl annotate node nodeIP xxxxxx=xxxxxx command to query the oversubscription threshold. For example:
kubectl annotate node 192.168.0.0 volcano.sh/oversubscription-evicting-cpu-high-watermark=70
# kubectl describe node 192.168.0.0 Name: 192.168.0.0 Roles: <none> Labels: ... volcano.sh/oversubscription=true Annotations: ... volcano.sh/oversubscription-evicting-cpu-high-watermark: 70Table 2 Node oversubscription annotations Name
Description
volcano.sh/oversubscription-evicting-cpu-high-watermark
When the CPU usage of a node exceeds the specified value, offline job eviction is triggered and the node becomes unschedulable.
The default value is 80, indicating that offline job eviction is triggered when the CPU usage of a node exceeds 80%.
volcano.sh/oversubscription-evicting-cpu-low-watermark
After eviction is triggered, the scheduling starts again when the CPU usage of a node is lower than the specified value.
The default value is 30, indicating that scheduling starts again when the CPU usage of a node is lower than 30%.
volcano.sh/oversubscription-evicting-memory-high-watermark
When the memory usage of a node exceeds the specified value, offline job eviction is triggered and the node becomes unschedulable.
The default value is 60, indicating that offline job eviction is triggered when the memory usage of a node exceeds 60%.
volcano.sh/oversubscription-evicting-memory-low-watermark
After eviction is triggered, the scheduling starts again when the memory usage of a node is lower than the specified value.
The default value is 30, indicating that the scheduling starts again when the memory usage of a node is less than 30%.
volcano.sh/oversubscription-types
Oversubscribed resource type. The options are as follows:
- CPU (oversubscribed CPU)
- memory (oversubscribed memory)
- cpu,memory (oversubscribed CPU and memory)
The default value is cpu,memory.
- Deploy online and offline jobs.
For an offline job, add the volcano.sh/preemptable label to annotations. You do not need to add this label for online jobs. For both online and offline jobs, set schedulerName to volcano to enable the Volcano scheduler.
For an offline job:
kind: Deployment apiVersion: apps/v1 spec: replicas: 4 template: metadata: annotations: metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]' volcano.sh/preemptable: 'rue' # Offline job label spec: schedulerName: volcano # The Volcano scheduler is used. ...For an online job:
kind: Deployment apiVersion: apps/v1 spec: replicas: 4 template: metadata: annotations: metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]' spec: schedulerName: volcano # The Volcano scheduler is used. ... - Run the following command to check the number of oversubscribed resources and the resource usage:
kubectl describe node <nodeIP>
# kubectl describe node 192.168.0.0 Name: 192.168.0.0 Roles: <none> Labels: ... volcano.sh/oversubscription=true Annotations: ... volcano.sh/oversubscription-cpu: 2335 volcano.sh/oversubscription-memory: 341753856 Allocatable: cpu: 3920m memory: 6263988Ki Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 4950m (126%) 4950m (126%) memory 1712Mi (27%) 1712Mi (27%)
Hybrid Deployment Example
The following uses an example to describe how to deploy online and offline jobs in hybrid mode.
- Assume that a cluster has two nodes: one oversubscribed node and one non-oversubscribed node.
# kubectl get node NAME STATUS ROLES AGE VERSION 192.168.0.24 Ready <none> 5d3h v1.21.4-r0-CCE21.11.1.B003 192.168.0.76 Ready <none> 2d2h v1.21.4-r0-CCE21.11.1.B004
- 192.168.0.76 is an oversubscribed node (with the volcano.sh/oversubscirption=true label).
- 192.168.0.24 is a non-oversubscribed node (without the volcano.sh/oversubscirption=true label).
# kubectl describe no 192.168.0.76 Name: 192.168.0.76 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 ... os.architecture=amd64 os.name=EulerOS_2.0_SP10x86_64 os.version=4.18.0-147.5.2.1.h579.eulerosv2r10.x86_64 volcano.sh/oversubscription=true - Submit offline job creation requests. If resources are sufficient, all offline jobs will be scheduled to the oversubscribed node.
The offline job template is as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: offline-job namespace: oversubscription labels: app: offline-job spec: replicas: 2 selector: matchLabels: app: offline-job template: metadata: labels: app: offline-job annotations: volcano.sh/preemptable: "true" # Offline job label spec: schedulerName: volcano # The Volcano scheduler is used. containers: - name: nginx imagePullPolicy: IfNotPresent image: centos:7 command: ["/bin/sh", "-c", "while true; do sleep 1; done"] resources: requests: cpu: 1000m memory: 500Mi limits: cpu: 1000m memory: 500MiOffline jobs are scheduled to the oversubscribed node.# kubectl get po -n oversubscription -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES offline-job-5dc944f84f-grzb2 1/1 Running 0 30s 192.168.3.200 192.168.0.76 <none> <none> offline-job-5dc944f84f-xjsx2 1/1 Running 0 30s 192.168.2.143 192.168.0.76 <none> <none>
- Submit online job creation requests. If resources are sufficient, the online jobs will be scheduled to the non-oversubscribed node.
The online job template is as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: online-service namespace: oversubscription labels: app: online-service spec: replicas: 2 selector: matchLabels: app: online-service template: metadata: labels: app: online-service spec: schedulerName: volcano nodeSelector: demo-node: "true" containers: - name: nginx imagePullPolicy: IfNotPresent image: centos:7 command: ["/bin/sh", "-c", "while true; do sleep 1; done"] resources: requests: cpu: 1000m memory: 500M limits: cpu: 1000m memory: 500MOnline jobs are scheduled to the non-oversubscribed node.# kubectl get po -n oversubscription -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES online-service-6f94f8cdf9-5mhn6 1/1 Running 0 30s 192.168.3.150 192.168.0.24 <none> <none> online-service-6f94f8cdf9-mz82w 1/1 Running 0 30s 192.168.2.238 192.168.0.24 <none> <none>
- Improve the resource usage of the oversubscribed node and observe whether offline job eviction is triggered.
Meanwhile, submit the online or offline jobs to the oversubscribed node (192.168.0.76).
# kubectl get po -n oversubscription -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES offline-job-776b4f75bc-5dpxm 1/1 Running 0 8s 192.168.2.228 192.168.0.76 <none> <none> offline-job-776b4f75bc-kkhbn 1/1 Running 0 8s 192.168.2.16 192.168.0.76 <none> <none> online-service-886d79845-5fqhq 1/1 Running 0 8s 192.168.3.169 192.168.0.76 <none> <none> online-service-886d79845-bpd9m 1/1 Running 0 8s 192.168.3.128 192.168.0.76 <none> <none>
Observe the oversubscribed node (192.168.0.76). You can find that oversubscribed resources exist and the CPU allocation rate exceeds 100%.# kubectl describe no 192.168.0.76 Name: 192.168.0.76 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=c6.22xlarge.2.physical beta.kubernetes.io/os=linux ... kubernetes.io/arch=amd64 kubernetes.io/hostname=192.168.0.76 volcano.sh/oversubscription=true Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.0.76 volcano.sh/oversubscription-cpu: 52278 volcano.sh/oversubscription-memory: 3033176651 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 105350m (120%) 105350m (120%) memory 5127195136 (1%) 5336910336 (1%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%)Increase the CPU usage of online jobs on the node. Offline job eviction is triggered.# kubectl get po -n oversubscription -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES offline-job-776b4f75bc-5dpxm 1/1 Running 0 18m 192.168.2.228 192.168.0.76 <none> <none> offline-job-776b4f75bc-htjwr 0/1 Pending 0 13s <none> <none> <none> <none> offline-job-776b4f75bc-kkhbn 0/1 Evicted 0 18m <none> 192.168.0.76 <none> <none> online-service-886d79845-5fqhq 1/1 Running 0 18m 192.168.3.169 192.168.0.76 <none> <none> online-service-886d79845-bpd9m 1/1 Running 0 18m 192.168.3.128 192.168.0.76 <none> <none>
Handling Suggestions
- After kubelet of the oversubscribed node is restarted, the resource view of the Volcano scheduler is not synchronized with that of kubelet. As a result, OutOfCPU occurs in some newly scheduled jobs, which is normal. After a period of time, the Volcano scheduler can properly schedule online and offline jobs.
- After online and offline jobs are submitted, you are not advised to dynamically change the job type (adding or deleting annotation volcano.sh/preemptable: 'true') because the current kernel does not support the change of an offline job to an online job.
- CCE collects the resource usage (CPU/memory) of all pods running on a node based on the status information in the cgroups system. The resource usage may be different from the monitored resource usage, for example, the resource statistics displayed by running the top command.
- The OS memory isolation function is enabled by setting /proc/sys/vm/memcg_qos_enable. If the file cannot be edited, see Method of Editing the Linux Memory Image File.
Last Article: Pod Labels and Annotations
Next Article: Networking
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.