Help Center> Cloud Container Engine> FAQ> Workload> Workload Abnormalities> What Should I Do If Pod Scheduling Fails?

What Should I Do If Pod Scheduling Fails?

Fault Locating

If the pod is in the Pending state and the event contains pod scheduling failure information, locate the cause based on the event information. For details about how to view events, see How Do I Use Events to Fix Abnormal Workloads?.

Troubleshooting Process

Determine the cause based on the event information, as listed in Table 1.

Table 1 Pod scheduling failure

Event Information

Cause and Solution

no nodes available to schedule pods.

No node is available in the cluster.

Check Item 1: Whether a Node Is Available in the Cluster

0/2 nodes are available: 2 Insufficient cpu.

0/2 nodes are available: 2 Insufficient memory.

Node resources (CPU and memory) are insufficient.

Check Item 2: Whether Node Resources (CPU and Memory) Are Sufficient

0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod anfinity rules, 1 node(s) didn't match pod anfinity/anti-anfinity.

The node and pod affinity configurations are mutually exclusive. No node meets the pod requirements.

Check Item 3: Affinity and Anti-Affinity Configuration of the Workload

0/2 nodes are available: 2 node(s) had volume node affinity conflict.

The EVS volume mounted to the pod and the node are not in the same AZ.

Check Item 4: Whether the Workload's Volume and Node Reside in the Same AZ

0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

Taints exist on the node, but the pod cannot tolerate these taints.

Check Item 5: Taint Toleration of Pods

0/7 nodes are available: 7 Insufficient ephemeral-storage.

The ephemeral storage space of the node is insufficient.

Check Item 6: Ephemeral Volume Usage

Check Item 1: Whether a Node Is Available in the Cluster

Log in to the CCE console and check whether the node status is Available. Alternatively, run the following command to check whether the node status is Ready:

$ kubectl get node
NAME           STATUS   ROLES    AGE   VERSION
192.168.0.37   Ready    <none>   21d   v1.19.10-r1.0.0-source-121-gb9675686c54267
192.168.0.71   Ready    <none>   21d   v1.19.10-r1.0.0-source-121-gb9675686c54267

If the status of all nodes is Not Ready, no node is available in the cluster.

Solution

Check Item 2: Whether Node Resources (CPU and Memory) Are Sufficient

If the resources requested by the pod exceed the allocatable resources of the node where the pod runs, the node cannot provide the resources required to run new pods and pod scheduling onto the node will definitely fail.

If the number of resources that can be allocated to a node is less than the number of resources that a pod requests, the node does not meet the resource requirements of the pod. As a result, the scheduling fails.

Solution

Add nodes to the cluster. Scale-out is the common solution to insufficient resources.

Check Item 3: Affinity and Anti-Affinity Configuration of the Workload

Inappropriate affinity policies will cause pod scheduling to fail.

Example:

An anti-affinity relationship is established between workload 1 and workload 2. Workload 1 is deployed on node 1 while workload 2 is deployed on node 2.

When you try to deploy workload 3 on node 1 and establish an affinity relationship with workload 2, a conflict occurs, resulting in a workload deployment failure.

0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod anfinity rules, 1 node(s) didn't match pod anfinity/anti-anfinity.

  • node selector indicates that the node affinity is not met.
  • pod anfinity rules indicates that the pod affinity is not met.
  • pod anfinity/anti-anfinity indicates that the pod affinity/anti-affinity is not met.

Solution

  • When adding workload-workload affinity and workload-node affinity policies, ensure that the two types of policies do not conflict each other. Otherwise, workload deployment will fail.
  • If the workload has a node affinity policy, make sure that supportContainer in the label of the affinity node is set to true. Otherwise, pods cannot be scheduled onto the affinity node and the following event is generated:
    No nodes are available that match all of the following predicates: MatchNode Selector, NodeNotSupportsContainer

    If supportContainer is set to false, the scheduling fails. The following figure shows the error information.

Check Item 4: Whether the Workload's Volume and Node Reside in the Same AZ

0/2 nodes are available: 2 node(s) had volume node affinity conflict.

An affinity conflict occurs between volumes and nodes. As a result, the scheduling fails. This is because EVS disks cannot be attached to nodes across AZs. For example, if the EVS volume is located in AZ 1 and the node is located in AZ 2, scheduling fails.

The EVS volume created on CCE has affinity settings by default, as shown below.

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pvc-c29bfac7-efa3-40e6-b8d6-229d8a5372ac
spec:
  ...
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
                - cn-east-3a

Solution

In the AZ where the workload's node resides, create a volume. Alternatively, create an identical workload and select an automatically assigned cloud storage volume.

Check Item 5: Taint Toleration of Pods

Check the taints on the node. If the following information is displayed, taints exist on the node:

$ kubectl describe node 192.168.0.37
Name:               192.168.0.37
...
Taints:             key1=value1:NoSchedule
...

To schedule the pod to the node, use either of the following methods:

  • Delete the taint from the node.
  • Specify a toleration for the pod containing the taint. For details, see Taints and Tolerations.
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "value1"
        effect: "NoSchedule" 

Check Item 6: Ephemeral Volume Usage

Check whether the size of the ephemeral volume in the pod is limited. If the size of the ephemeral volume required by the application exceeds the existing capacity of the node, the application cannot be scheduled. To solve this problem, change the size of the ephemeral volume or expand the disk capacity of the node.

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
  - name: app
    image: images.my-company.example/app:v4
    resources:
      requests:
        ephemeral-storage: "2Gi"
      limits:
        ephemeral-storage: "4Gi"
    volumeMounts:
    - name: ephemeral
      mountPath: "/tmp"
  volumes:
    - name: ephemeral
      emptyDir: {}