Help Center> Cloud Container Engine> FAQs> Workload> Scheduling Policies> How Do I Prevent a Non-GPU or NPU Workload from Being Scheduled to a GPU or NPU Node?
Updated on 2024-07-04 GMT+08:00

How Do I Prevent a Non-GPU or NPU Workload from Being Scheduled to a GPU or NPU Node?

Symptom

If there are GPU/NPU nodes and other types of nodes running in your cluster, the non-NPU/GPU workloads may be scheduled to the GPU/NPU nodes. In this case, the GPU/NPU resources cannot be used properly.

Possible Causes

The non-GPU/NPU workloads use CPU and memory resources. The GPU/NPU nodes can provide these resources. The scheduler may schedule the non-GPU/NPU workloads to these nodes, even if the workloads do not claim to use the GPU/NPU nodes. This may result in the idle GPU/NPU resources.

Solution

Add taints to the GPU/NPU nodes and configure tolerations to prevent non-GPU/NPU workloads from being scheduled to these nodes.

  • For the GPU/NPU workloads, add tolerations so that they can be scheduled to the GPU/NPU nodes.
  • For the non-GPU/NPU workload, if tolerations are not configured, they cannot be scheduled to the GPU/NPU nodes.

The procedure is as follows:

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. In the navigation pane, choose Nodes. Click the Nodes tab, select a GPU/NPU node, and click Labels and Taints above the list.
  3. Click Add Operation under Batch Operation and add a taint to the node.

    Select Taint. Enter the key and value and select the taint effect. The following example shows how to add the accelerator=true:NoSchedule taint to the GPU or NPU nodes.

    Figure 1 Adding a taint

  4. When creating a GPU/NPU workload, manually add a toleration in the Advanced Settings area.

    Figure 2 Adding a toleration

  5. When creating a non-GPU/NPU workload, add no tolerations, and this workload will not be scheduled to the GPU/NPU nodes.

Scheduling Policies FAQs

more