Help Center> Cloud Container Engine> User Guide> New Console> Hybrid Deployment of Online and Offline Jobs

Hybrid Deployment of Online and Offline Jobs

Hybrid deployment of online and offline jobs is an OBT feature.

Online and Offline Jobs

Jobs can be classified into online jobs and offline jobs based on whether services are always online.

  • Online job: Such jobs run for a long time, with regular traffic surges, tidal resource requests, and high requirements on SLA, such as advertising and e-commerce services.
  • Offline jobs: Such jobs run for a short time, have high computing requirements, and can tolerate high latency, such as AI and big data services.

Resource Oversubscription and Hybrid Deployment

Many services see surges in traffic. To ensure performance and stability, resources are often requested at the maximum needed. However, the surges may ebb very shortly and resources, if not released, are wasted in non-peak hours. Especially for online jobs that request a large quantity of resources to ensure SLA, resource utilization can be as low as it gets.

Resource oversubscription is the process of making use of idle requested resources. Oversubscribed resources are suitable for deploying offline jobs, which focus on throughput but have low SLA requirements and can tolerate certain failures.

Hybrid deployment of online and offline jobs in a cluster can better utilize cluster resources.

Oversubscription for Hybrid Deployment

CPU and memory resources can be oversubscribed. The key features are as follows:

  • Offline jobs preferentially run on oversubscribed nodes.

    If both oversubscribed and non-oversubscribed nodes exist, the former will score higher than the latter and offline jobs are preferentially scheduled to oversubscribed nodes.

  • Online jobs can use only non-oversubscribed resources if scheduled to an oversubscribed node.

    Offline jobs can use both oversubscribed and non-oversubscribed resources of an oversubscribed node.

  • In the same scheduling period, online jobs take precedence over offline jobs.

    If both online and offline jobs exist, online jobs are scheduled first. When the node resource usage exceeds the upper limit and the node requests exceed 100%, offline jobs will be evicted.

  • CPU/memory isolation is provided by kernels.

    CPU isolation: Online jobs can quickly preempt CPU resources of offline jobs and suppress the CPU usage of the offline jobs.

    Memory isolation: When system memory resources are used up and OOM Kill is triggered, the kernel evicts offline jobs first.

Notes and Constraints

  • Only BMSs in CCE Turbo clusters of v1.21 or later are supported.
  • EulerOS 2.10 must be used on nodes.
  • The volcano add-on of v1.3.4 or later must be installed on clusters.
  • Before enabling the volcano oversubscription add-on, ensure that the overcommit add-on is not enabled.
  • Before enabling the volcano oversubscription add-on, ensure that the CPU Manager Policies are not enabled.
  • Before enabling the volcano oversubscription add-on, ensure that the Topology Management Policies are not enabled.

Configuring Oversubscription Labels for Scheduling

If the label volcano.sh/oversubscription=true is configured for a node in the cluster, the oversubscription configuration must be added to the volcano add-on. Otherwise, the scheduling of oversold nodes will be abnormal. For details about the related configuration, see Table 1.

Ensure that you have correctly configure labels because the scheduler does not check the add-on and node configurations.
Table 1 Configuring oversubscription labels for scheduling

Oversubscription in Add-on

Oversubscription Label on Node

Scheduling

Yes

Yes

Triggered by oversubscription

Yes

No

Triggered

No

No

Triggered

No

Yes

Not triggered or failed. Avoid this configuration.

Using Hybrid Deployment

  1. Configure the volcano add-on.

    1. Use kubectl to connect to the cluster. For details, see Connecting to a Cluster Using kubectl.
    2. Run the following command to add the oversubscription plug-in to volcano-scheduler-configmap. Ensure that the add-on configuration does not contain the overcommit plug-in. If - name: overcommit exists, delete this configuration.
      # kubectl edit cm volcano-scheduler-configmap -n kube-system
      apiVersion: v1
      data:
        volcano-scheduler.conf: |
          actions: "enqueue, allocate, backfill"
          tiers:
          - plugins:
            - name: gang
            - name: priority
            - name: conformance
            - name: oversubscription
          - plugins:
            - name: drf
            - name: predicates
            - name: proportion
            - name: nodeorder
            - name: binpack

  2. Enable the node oversubscription feature.

    A label can be configured to use oversubscribed resources only after the oversubscription feature is enabled for a node. Related nodes can be created only in a node pool. To enable the oversubscription feature, perform the following steps:

    1. Create a BMS pool.
    2. Click Configuration next to the node pool name.
    3. On the Configuration page displayed, set over-subscription-resource under kubelet to true (visible only for BMS node pools) and click OK.

  3. Set the node oversubscription label.

    The volcano.sh/oversubscription label needs to be configured for an oversubscribed node. If this label is set for a node and the value is true, the node is an oversubscribed node. Otherwise, the node is not an oversubscribed node.

    # kubectl label node 192.168.0.0 volcano.sh/oversubscription=true

    An oversubscribed node also supports the oversubscription thresholds, as listed in Table 2. You can use the kubectl annotate node nodeIP xxxxxx=xxxxxx command to query the oversubscription threshold. For example:

    kubectl annotate node 192.168.0.0 volcano.sh/oversubscription-evicting-cpu-high-watermark=70

    # kubectl describe node 192.168.0.0
    Name:             192.168.0.0
    Roles:              <none>
    Labels:           ...
                      volcano.sh/oversubscription=true
    Annotations:      ...
                      volcano.sh/oversubscription-evicting-cpu-high-watermark: 70
    Table 2 Node oversubscription annotations

    Name

    Description

    volcano.sh/oversubscription-evicting-cpu-high-watermark

    When the CPU usage of a node exceeds the specified value, offline job eviction is triggered and the node becomes unschedulable.

    The default value is 80, indicating that offline job eviction is triggered when the CPU usage of a node exceeds 80%.

    volcano.sh/oversubscription-evicting-cpu-low-watermark

    After eviction is triggered, the scheduling starts again when the CPU usage of a node is lower than the specified value.

    The default value is 30, indicating that scheduling starts again when the CPU usage of a node is lower than 30%.

    volcano.sh/oversubscription-evicting-memory-high-watermark

    When the memory usage of a node exceeds the specified value, offline job eviction is triggered and the node becomes unschedulable.

    The default value is 60, indicating that offline job eviction is triggered when the memory usage of a node exceeds 60%.

    volcano.sh/oversubscription-evicting-memory-low-watermark

    After eviction is triggered, the scheduling starts again when the memory usage of a node is lower than the specified value.

    The default value is 30, indicating that the scheduling starts again when the memory usage of a node is less than 30%.

    volcano.sh/oversubscription-types

    Oversubscribed resource type. The options are as follows:

    • CPU (oversubscribed CPU)
    • memory (oversubscribed memory)
    • cpu,memory (oversubscribed CPU and memory)

    The default value is cpu,memory.

  4. Deploy online and offline jobs.

    For an offline job, add the volcano.sh/preemptable label to annotations. You do not need to add this label for online jobs. For both online and offline jobs, set schedulerName to volcano to enable the Volcano scheduler.

    For an offline job:

    kind: Deployment
    apiVersion: apps/v1
    spec:
      replicas: 4
      template:
        metadata:
          annotations:
            metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]'
            volcano.sh/preemptable: 'rue' # Offline job label
        spec:
          schedulerName: volcano             # The Volcano scheduler is used.
          ...

    For an online job:

    kind: Deployment
    apiVersion: apps/v1
    spec:
      replicas: 4
      template:
        metadata:
          annotations:
            metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]'
        spec:
          schedulerName: volcano          # The Volcano scheduler is used.
          ...

  5. Run the following command to check the number of oversubscribed resources and the resource usage:

    kubectl describe node <nodeIP>

    # kubectl describe node 192.168.0.0
    Name:             192.168.0.0
    Roles:              <none>
    Labels:           ...
                      volcano.sh/oversubscription=true
    Annotations:      ...
                      volcano.sh/oversubscription-cpu: 2335
                      volcano.sh/oversubscription-memory: 341753856
    Allocatable:
      cpu:               3920m
      memory:            6263988Ki
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      Resource           Requests      Limits
      --------           --------      ------
      cpu                 4950m (126%)  4950m (126%)
      memory             1712Mi (27%)  1712Mi (27%)

Hybrid Deployment Example

The following uses an example to describe how to deploy online and offline jobs in hybrid mode.

  1. Assume that a cluster has two nodes: one oversubscribed node and one non-oversubscribed node.

    # kubectl get node
    NAME           STATUS   ROLES    AGE    VERSION
    192.168.0.24   Ready    <none>   5d3h   v1.21.4-r0-CCE21.11.1.B003
    192.168.0.76   Ready    <none>   2d2h   v1.21.4-r0-CCE21.11.1.B004
    • 192.168.0.76 is an oversubscribed node (with the volcano.sh/oversubscirption=true label).
    • 192.168.0.24 is a non-oversubscribed node (without the volcano.sh/oversubscirption=true label).
    # kubectl describe no 192.168.0.76
    Name:               192.168.0.76
    Roles:              <none>
    Labels:             beta.kubernetes.io/arch=amd64
                        ...
                        os.architecture=amd64
                        os.name=EulerOS_2.0_SP10x86_64
                        os.version=4.18.0-147.5.2.1.h579.eulerosv2r10.x86_64
                        volcano.sh/oversubscription=true

  2. Submit offline job creation requests. If resources are sufficient, all offline jobs will be scheduled to the oversubscribed node.

    The offline job template is as follows:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: offline-job
      namespace: oversubscription
      labels:
        app: offline-job
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: offline-job
      template:
        metadata:
          labels:
            app: offline-job
          annotations:
            volcano.sh/preemptable: "true"       # Offline job label
        spec:
          schedulerName: volcano                 # The Volcano scheduler is used.
          containers:
            - name: nginx
              imagePullPolicy: IfNotPresent
              image: centos:7
              command: ["/bin/sh", "-c", "while true; do sleep 1; done"]
              resources:
                requests:
                  cpu: 1000m
                  memory: 500Mi
                limits:
                  cpu: 1000m
                  memory: 500Mi
    Offline jobs are scheduled to the oversubscribed node.
    # kubectl get po -n oversubscription -o wide
    NAME                           READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
    offline-job-5dc944f84f-grzb2   1/1     Running   0          30s   192.168.3.200   192.168.0.76   <none>           <none>
    offline-job-5dc944f84f-xjsx2   1/1     Running   0          30s   192.168.2.143   192.168.0.76   <none>           <none>

  3. Submit online job creation requests. If resources are sufficient, the online jobs will be scheduled to the non-oversubscribed node.

    The online job template is as follows:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: online-service
      namespace: oversubscription
      labels:
        app: online-service
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: online-service
      template:
        metadata:
          labels:
            app: online-service
        spec:
          schedulerName: volcano
          nodeSelector:
            demo-node: "true"
          containers:
            - name: nginx
              imagePullPolicy: IfNotPresent
              image: centos:7
              command: ["/bin/sh", "-c", "while true; do sleep 1; done"]
              resources:
                requests:
                  cpu: 1000m
                  memory: 500M
                limits:
                  cpu: 1000m
                  memory: 500M
    Online jobs are scheduled to the non-oversubscribed node.
    # kubectl get po -n oversubscription -o wide
    NAME                              READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
    online-service-6f94f8cdf9-5mhn6   1/1     Running   0          30s   192.168.3.150   192.168.0.24   <none>           <none>
    online-service-6f94f8cdf9-mz82w   1/1     Running   0          30s   192.168.2.238   192.168.0.24   <none>           <none>

  4. Improve the resource usage of the oversubscribed node and observe whether offline job eviction is triggered.

    Meanwhile, submit the online or offline jobs to the oversubscribed node (192.168.0.76).
    # kubectl get po -n oversubscription -o wide
    NAME                             READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
    offline-job-776b4f75bc-5dpxm     1/1     Running   0          8s    192.168.2.228   192.168.0.76   <none>           <none>
    offline-job-776b4f75bc-kkhbn     1/1     Running   0          8s    192.168.2.16    192.168.0.76   <none>           <none>
    online-service-886d79845-5fqhq   1/1     Running   0          8s    192.168.3.169   192.168.0.76   <none>           <none>
    online-service-886d79845-bpd9m   1/1     Running   0          8s    192.168.3.128   192.168.0.76   <none>           <none>
    Observe the oversubscribed node (192.168.0.76). You can find that oversubscribed resources exist and the CPU allocation rate exceeds 100%.
    # kubectl describe no 192.168.0.76
    Name:               192.168.0.76
    Roles:              <none>
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/instance-type=c6.22xlarge.2.physical
                        beta.kubernetes.io/os=linux
                        ...
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=192.168.0.76
                        volcano.sh/oversubscription=true
    Annotations:        alpha.kubernetes.io/provided-node-ip: 192.168.0.76
                        volcano.sh/oversubscription-cpu: 52278
                        volcano.sh/oversubscription-memory: 3033176651
    ...
    Allocated resources:	
      (Total limits may be over 100 percent, i.e., overcommitted.)
      Resource           Requests         Limits
      --------           --------         ------
      cpu                105350m (120%)   105350m (120%)
      memory             5127195136 (1%)  5336910336 (1%)
      ephemeral-storage  0 (0%)           0 (0%)
      hugepages-1Gi      0 (0%)           0 (0%)
      hugepages-2Mi      0 (0%)           0 (0%)
    Increase the CPU usage of online jobs on the node. Offline job eviction is triggered.
    # kubectl get po -n oversubscription -o wide
    NAME                             READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
    offline-job-776b4f75bc-5dpxm     1/1     Running   0          18m   192.168.2.228   192.168.0.76   <none>           <none>
    offline-job-776b4f75bc-htjwr     0/1     Pending   0          13s   <none>          <none>         <none>           <none>
    offline-job-776b4f75bc-kkhbn     0/1     Evicted   0          18m   <none>          192.168.0.76   <none>           <none>
    online-service-886d79845-5fqhq   1/1     Running   0          18m   192.168.3.169   192.168.0.76   <none>           <none>
    online-service-886d79845-bpd9m   1/1     Running   0          18m   192.168.3.128   192.168.0.76   <none>           <none>

Handling Suggestions

  • After kubelet of the oversubscribed node is restarted, the resource view of the Volcano scheduler is not synchronized with that of kubelet. As a result, OutOfCPU occurs in some newly scheduled jobs, which is normal. After a period of time, the Volcano scheduler can properly schedule online and offline jobs.
  • After online and offline jobs are submitted, you are not advised to dynamically change the job type (adding or deleting annotation volcano.sh/preemptable: 'true') because the current kernel does not support the change of an offline job to an online job.
  • CCE collects the resource usage (CPU/memory) of all pods running on a node based on the status information in the cgroups system. The resource usage may be different from the monitored resource usage, for example, the resource statistics displayed by running the top command.
  • The OS memory isolation function is enabled by setting /proc/sys/vm/memcg_qos_enable. If the file cannot be edited, see Method of Editing the Linux Memory Image File.