Volcano
Introduction
Volcano is a batch processing platform based on Kubernetes. It provides a series of features required by machine learning, deep learning, bioinformatics, genomics, and other big data applications, as a powerful supplement to Kubernetes capabilities.
Volcano provides general-purpose, high-performance computing capabilities, such as job scheduling engine, heterogeneous chip management, and job running management, serving end users through computing frameworks for different industries, such as AI, big data, gene sequencing, and rendering. (Volcano has been open-sourced in GitHub.)
Volcano provides job scheduling, job management, and queue management for computing applications. Its main features are as follows:
- Diverse computing frameworks, such as TensorFlow, MPI, and Spark, can run on Kubernetes in containers. Common APIs for batch computing jobs through CRD, various add-ons, and advanced job lifecycle management are provided.
- Advanced scheduling capabilities are provided for batch computing and high-performance computing scenarios, including group scheduling, preemptive priority scheduling, packing, resource reservation, and task topology.
- Queues can be effectively managed for scheduling jobs. Complex job scheduling capabilities such as queue priority and multi-level queues are supported.
Open source community: https://github.com/volcano-sh/volcano
Installing the Add-on
Install the Volcano add-on. An on-premises cluster does not support multi-AZ deployment and affinity policies.
After the Volcano add-on is installed in an on-premises cluster, only Volcano can be configured to schedule the created workload in YAML.
- Log in to the UCS console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. Locate Volcano and click Install.
- Select Standalone, Custom, or HA for Add-on Specifications.
If you select Custom, the following requests and limits are recommended for volcano-controller and volcano-scheduler:
- If the number of nodes is less than 100, retain the default configuration. The requested CPU is 500m, and the limit is 2000m. The requested memory is 500 Mi, and the limit is 2000 Mi.
- If the number of nodes is greater than 100, increase the requested CPU by 500m and the requested memory by 1000 Mi each time 100 nodes (10,000 pods) are added. Increase the CPU limit by 1500m and the memory limit by 1000 Mi.
Formulas for calculating the requests and limits:
- CPU: Calculate the number of nodes multiplied by the number of pods, perform interpolation search using the product of the number of nodes in the cluster multiplied by the number of pods in Table 1, and round up the request and limit that are closest to the specifications.
For example, for 2,000 nodes and 20,000 pods, Number of target nodes x Number of target pods = 40 million, which is close to 700/70000 in the specification (Number of nodes x Number of pods = 49 million). You are advised to set the CPU request to 4000m and the limit to 5500m.
- Memory: Allocate 2.4 GiB of memory to every 1,000 nodes and 1 GiB of memory to every 10,000 pods. The memory request is the sum of the two values. (The obtained value may be different from the recommended value in Table 1. You can use either of them.)
Memory request = Number of nodes/1000 x 2.4 GiB + Number of pods/10000 x 1 GiB
For example, for 2,000 nodes and 20,000 pods, the memory request value is 6.8 GiB (2000/1000 x 2.4 GiB + 20000/10000 x 1 GiB).
Table 1 Recommended requests and limits for volcano-controller and volcano-scheduler Nodes/Pods in a Cluster
CPU Request (m)
CPU Limit (m)
Memory Request (Mi)
Memory Limit (Mi)
50/5,000
500
2,000
500
2,000
100/10,000
1,000
2,500
1,500
2,500
200/20,000
1,500
3,000
2,500
3,500
300/30,000
2,000
3,500
3,500
4,500
400/40,000
2,500
4,000
4,500
5,500
500/50,000
3,000
4,500
5,500
6,500
600/60,000
3,500
5,000
6,500
7,500
700/70,000
4,000
5,500
7,500
8,500
- CPU: Calculate the number of nodes multiplied by the number of pods, perform interpolation search using the product of the number of nodes in the cluster multiplied by the number of pods in Table 1, and round up the request and limit that are closest to the specifications.
- Configure the parameters of the default Volcano scheduler. For details, see Table 2.
colocation_enable: '' default_scheduler_conf: actions: 'allocate, backfill' tiers: - plugins: - name: 'priority' - name: 'gang' - name: 'conformance' - plugins: - name: 'drf' - name: 'predicates' - name: 'nodeorder' - plugins: - name: 'cce-gpu-topology-predicate' - name: 'cce-gpu-topology-priority' - name: 'cce-gpu' - plugins: - name: 'nodelocalvolume' - name: 'nodeemptydirvolume' - name: 'nodeCSIscheduling' - name: 'networkresource'
Table 2 Volcano add-ons Add-on
Function
Description
Demonstration
resource_exporter_enable
Collects NUMA topology information of a node.
Values:
- true: You can view the NUMA topology information of the current node.
- false: This option disables the NUMA topology information of the current node.
-
binpack
Schedules pods to nodes with high resource utilization to reduce resource fragments.
- binpack.weight: weight of the binpack add-on.
- binpack.cpu: percentage of CPU. The default value is 1.
- binpack.memory: percentage of memory. The default value is 1.
- binpack.resources: resource type.
- plugins: - name: binpack arguments: binpack.weight: 10 binpack.cpu: 1 binpack.memory: 1 binpack.resources: nvidia.com/gpu, example.com/foo binpack.resources.nvidia.com/gpu: 2 binpack.resources.example.com/foo: 3
conformance
Prevent key pods, such as the pods in the kube-system namespace from being preempted.
-
-
gang
The gang add-on considers a group of pods as a whole to allocate resources.
-
-
priority
The priority add-on schedules pods based on the custom workload priority.
-
-
overcommit
Resources in a cluster are scheduled after being accumulated in a certain multiple to improve the workload enqueuing efficiency. If all workloads are Deployments, remove this add-on or set the raising factor to 2.0.
overcommit-factor: Raising factor. The default value is 1.2.
- plugins: - name: overcommit arguments: overcommit-factor: 2.0
drf
Schedules resources based on the container group dominant resources. The smallest dominant resources would be selected for priority scheduling.
-
-
predicates
Determines whether a task is bound to a node using a series of evaluation algorithms, such as node/pod affinity, taint tolerance, node port repetition, volume limits, and volume zone matching.
-
-
nodeorder
The nodeorder add-on scores all nodes for a task by using a series of scoring algorithms.
- nodeaffinity.weight: Pods are scheduled based on the node affinity. The default value is 1.
- podaffinity.weight: Pods are scheduled based on the pod affinity. The default value is 1.
- leastrequested.weight: Pods are scheduled to the node with the least requested resources. The default value is 1.
- balancedresource.weight: Pods are scheduled to the node with balanced resource. The default value is 1.
- mostrequested.weight: Pods are scheduled to the node with the most requested resources. The default value is 0.
- tainttoleration.weight: Pods are scheduled to the node with a high taint tolerance. The default value is 1.
- imagelocality.weight: Pods are scheduled to the node where the required images exist. The default value is 1.
- selectorspread.weight: Pods are evenly scheduled to different nodes. The default value is 0.
- volumebinding.weight: Pods are scheduled to the node with the local PV delayed binding policy. The default value is 1.
- podtopologyspread.weight: Pods are scheduled based on the pod topology. The default value is 2.
- plugins: - name: nodeorder arguments: leastrequested.weight: 1 mostrequested.weight: 0 nodeaffinity.weight: 1 podaffinity.weight: 1 balancedresource.weight: 1 tainttoleration.weight: 1 imagelocality.weight: 1 volumebinding.weight: 1 podtopologyspread.weight: 2
cce-gpu-topology-predicate
GPU-topology scheduling preselection algorithm
-
-
cce-gpu-topology-priority
GPU-topology scheduling priority algorithm
-
-
cce-gpu
GPU resource allocation that supports decimal GPU configurations by working with the gpu add-on.
-
-
numaaware
NUMA topology scheduling
weight: Weight of the numa-aware add-on.
-
networkresource
The ENI requirement node can be preselected and filtered. The parameters are transferred by CCE and do not need to be manually configured.
NetworkType: network type (eni or vpc-router).
-
nodelocalvolume
Filters out nodes that do not meet local volume requirements.
-
-
nodeemptydirvolume
Filters out nodes that do not meet the emptyDir requirements.
-
-
nodeCSIscheduling
Filters out nodes that have everest component exceptions.
-
-
- Click Install.
Modifying the volcano-scheduler Configurations Using the Console
Volcano allows you to configure the scheduler during installation, upgrade, and editing. The configuration will be synchronized to volcano-scheduler-configmap.
This section describes how to configure volcano-scheduler.
Only Volcano v1.7.1 and later support this function. On the new add-on page, options such as plugins.eas_service and resource_exporter_enable are replaced by default_scheduler_conf.
Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. On the right of the displayed page, locate Volcano and click Install or Upgrade. In the Parameters area, configure the volcano-scheduler parameters.
- Using resource_exporter:
{ "ca_cert": "", "default_scheduler_conf": { "actions": "allocate, backfill", "tiers": [ { "plugins": [ { "name": "priority" }, { "name": "gang" }, { "name": "conformance" } ] }, { "plugins": [ { "name": "drf" }, { "name": "predicates" }, { "name": "nodeorder" } ] }, { "plugins": [ { "name": "cce-gpu-topology-predicate" }, { "name": "cce-gpu-topology-priority" }, { "name": "cce-gpu" }, { "name": "numa-aware" # add this also enable resource_exporter } ] }, { "plugins": [ { "name": "nodelocalvolume" }, { "name": "nodeemptydirvolume" }, { "name": "nodeCSIscheduling" }, { "name": "networkresource" } ] } ] }, "server_cert": "", "server_key": "" }
After the parameters are configured, you can use the functions of the numa-aware add-on and resource_exporter at the same time.
- Using eas_service:
{ "ca_cert": "", "default_scheduler_conf": { "actions": "allocate, backfill", "tiers": [ { "plugins": [ { "name": "priority" }, { "name": "gang" }, { "name": "conformance" } ] }, { "plugins": [ { "name": "drf" }, { "name": "predicates" }, { "name": "nodeorder" } ] }, { "plugins": [ { "name": "cce-gpu-topology-predicate" }, { "name": "cce-gpu-topology-priority" }, { "name": "cce-gpu" }, { "name": "eas", "custom": { "availability_zone_id": "", "driver_id": "", "endpoint": "", "flavor_id": "", "network_type": "", "network_virtual_subnet_id": "", "pool_id": "", "project_id": "", "secret_name": "eas-service-secret" } } ] }, { "plugins": [ { "name": "nodelocalvolume" }, { "name": "nodeemptydirvolume" }, { "name": "nodeCSIscheduling" }, { "name": "networkresource" } ] } ] }, "server_cert": "", "server_key": "" }
- Using ief:
{ "ca_cert": "", "default_scheduler_conf": { "actions": "allocate, backfill", "tiers": [ { "plugins": [ { "name": "priority" }, { "name": "gang" }, { "name": "conformance" } ] }, { "plugins": [ { "name": "drf" }, { "name": "predicates" }, { "name": "nodeorder" } ] }, { "plugins": [ { "name": "cce-gpu-topology-predicate" }, { "name": "cce-gpu-topology-priority" }, { "name": "cce-gpu" }, { "name": "ief", "enableBestNode": true } ] }, { "plugins": [ { "name": "nodelocalvolume" }, { "name": "nodeemptydirvolume" }, { "name": "nodeCSIscheduling" }, { "name": "networkresource" } ] } ] }, "server_cert": "", "server_key": "" }
Retaining the Original Configurations of volcano-scheduler-configmap
If you want to use the original configurations after the add-on is upgraded, perform the following steps:
- Check and back up the original volcano-scheduler-configmap configuration.
Example:
# kubectl edit cm volcano-scheduler-configmap -n kube-system apiVersion: v1 data: default-scheduler.conf: |- actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang - name: conformance - plugins: - name: drf - name: predicates - name: nodeorder - name: binpack arguments: binpack.cpu: 100 binpack.weight: 10 binpack.resources: nvidia.com/gpu binpack.resources.nvidia.com/gpu: 10000 - plugins: - name: cce-gpu-topology-predicate - name: cce-gpu-topology-priority - name: cce-gpu - plugins: - name: nodelocalvolume - name: nodeemptydirvolume - name: nodeCSIscheduling - name: networkresource
- Enter the customized content in the Parameters area on the console.
{ "ca_cert": "", "default_scheduler_conf": { "actions": "enqueue, allocate, backfill", "tiers": [ { "plugins": [ { "name": "priority" }, { "name": "gang" }, { "name": "conformance" } ] }, { "plugins": [ { "name": "drf" }, { "name": "predicates" }, { "name": "nodeorder" }, { "name": "binpack", "arguments": { "binpack.cpu": 100, "binpack.weight": 10, "binpack.resources": "nvidia.com/gpu", "binpack.resources.nvidia.com/gpu": 10000 } } ] }, { "plugins": [ { "name": "cce-gpu-topology-predicate" }, { "name": "cce-gpu-topology-priority" }, { "name": "cce-gpu" } ] }, { "plugins": [ { "name": "nodelocalvolume" }, { "name": "nodeemptydirvolume" }, { "name": "nodeCSIscheduling" }, { "name": "networkresource" } ] } ] }, "server_cert": "", "server_key": "" }
After the parameters are configured, the original content in volcano-scheduler-configmap will be overwritten. Therefore, you must check whether volcano-scheduler-configmap has been modified during the upgrade. If volcano-scheduler-configmap has been modified, synchronize the modification to the upgrade page.
Related Operations
Change History
You are advised to upgrade Volcano to the latest version that matches the cluster.
Cluster Version |
Add-on Version |
---|---|
v1.25 |
1.7.1 and 1.7.2 |
v1.23 |
1.7.1 and 1.7.2 |
v1.21 |
1.7.1 and 1.7.2 |
v1.19.16 |
1.3.7, 1.3.10, 1.4.5, 1.7.1, and 1.7.2 |
v1.19 |
1.3.7, 1.3.10, and 1.4.5 |
v1.17 (End of maintenance) |
1.3.7, 1.3.10, and 1.4.5 |
v1.15 (End of maintenance) |
1.3.7, 1.3.10, and 1.4.5 |
Add-on Version |
Supported Cluster Version |
Updated Feature |
---|---|---|
1.9.1 |
/v1.19.16.*|v1.21.*|v1.23.*|v1.25.*/ |
|
1.7.2 |
/v1.19.16.*|v1.21.*|v1.23.*|v1.25.*/ |
|
1.7.1 |
/v1.19.16.*|v1.21.*|v1.23.*|v1.25.*/ |
Supported Kubernetes 1.25. |
1.6.5 |
/v1.19.*|v1.21.*|v1.23.*/ |
|
1.4.5 |
/v1.17.*|v1.19.*|v1.21.*/ |
|
1.4.2 |
/v1.15.*|v1.17.*|v1.19.*|v1.21.*/ |
|
1.3.3 |
/v1.15.*|v1.17.*|v1.19.*|v1.21.*/ |
|
1.3.1 |
/v1.15.*|v1.17.*|v1.19.*/ |
|
1.2.5 |
/v1.15.*|v1.17.*|v1.19.*/ |
|
1.2.3 |
/v1.15.*|v1.17.*|v1.19.*/ |
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot