Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Container Engine/ FAQs/ Workload/ Workload Exception Troubleshooting/ What Should I Do If a Pod Fails to Be Evicted?

What Should I Do If a Pod Fails to Be Evicted?

Updated on 2024-11-13 GMT+08:00

Principle of Eviction

When a node is abnormal, Kubernetes will evict pods on the node to ensure workload availability.

In Kubernetes, both kube-controller-manager and kubelet can evict pods.

  • Eviction implemented by kube-controller-manager

    kube-controller-manager consists of multiple controllers, and eviction is implemented by node controller. node controller periodically checks the status of all nodes. If a node is in the NotReady state for a period of time, all pods on the node will be evicted.

    kube-controller-manager supports the following startup parameters:

    • pod-eviction-timeout: indicates an interval when a node is down, after which pods on that node are evicted. The default interval is 5 minutes.
    • node-eviction-rate: indicates the number of nodes to be evicted per second. The default value is 0.1, indicating that pods are evicted from one node every 10 seconds.
    • secondary-node-eviction-rate: specifies a rate at which nodes are evicted in the second grade. If a large number of nodes are down in the cluster, the eviction rate will be reduced to secondary-node-eviction-rate. The default value is 0.01.
    • unhealthy-zone-threshold: specifies a threshold for an AZ to be considered unhealthy. The default value is 0.55, meaning that if the percentage of faulty nodes in an AZ exceeds 55%, the AZ will be considered unhealthy.
    • large-cluster-size-threshold: specifies a threshold for a cluster to be considered large. The parameter defaults to 50. If there are more nodes than this threshold, the cluster is considered as a large one. If there are more than 55% faulty nodes in a cluster, the eviction rate is reduced to 0.01. If the cluster is a small one, the eviction rate is reduced to 0, which means, pods running on the nodes in the cluster will not be evicted.
  • Eviction implemented by kubelet

    If resources of a node are to be used up, kubelet executes the eviction policy based on the pod priority, resource usage, and resource request. If pods have the same priority, the pod that uses the most resources or requests for the most resources will be evicted first.

    kube-controller-manager evicts all pods on a faulty node, while kubelet evicts some pods on a faulty node. kubelet periodically checks the memory and disk resources of nodes. If the resources are insufficient, it will evict some pods based on the priority. For details about the pod eviction priority, see Pod selection for kubelet eviction.

    There are soft eviction thresholds and hard eviction thresholds.

    • Soft eviction thresholds: A grace period is configured for node resources. kubelet will reclaim node resources associated with these thresholds if that grace period elapses. If the node resource usage reaches these thresholds but falls below them before the grace period elapses, kubelet will not evict pods on the node.
      You can configure soft eviction thresholds using the following parameters:
      • eviction-soft: indicates a soft eviction threshold. If a node's eviction signal reaches a certain threshold, for example, memory.available<1.5Gi, kubelet will not immediately evict some pods on the node but wait for a grace period configured by eviction-soft-grace-period. If the threshold is reached after the grace period elapses, kubelet will evict some pods on the node.
      • eviction-soft-grace-period: indicates an eviction grace period. If a pod reaches the soft eviction threshold, it will be terminated after the configured grace period elapses. This parameter indicates the time difference for a terminating pod to respond to the threshold being met. The default grace period is 90 seconds.
      • eviction-max-pod-grace-period: indicates the maximum allowed grace period to use when terminating pods in response to a soft eviction threshold being met.
    • Hard eviction thresholds: Pods are immediately evicted once these thresholds are reached.

      You can configure hard eviction thresholds using the following parameters:

      eviction-hard: indicates a hard eviction threshold. When the eviction signal of a node reaches a certain threshold, for example, memory.available<1Gi, which means, when the available memory of the node is less than 1 GiB, a pod eviction will be triggered immediately.

      kubelet supports the following default hard eviction thresholds:

      • memory.available<100Mi
      • nodefs.available<10%
      • imagefs.available<15%
      • nodefs.inodesFree<5% (for Linux nodes)

    kubelet also supports other parameters:

    • eviction-pressure-transition-period: indicates a period for which the kubelet has to wait before transitioning out of an eviction pressure condition. The default value is 5 minutes. If the time exceeds the threshold, the node is set to DiskPressure or MemoryPressure. Then some pods running on the node will be evicted. This parameter can prevent mistaken eviction decisions when a node is oscillating above and below a soft eviction threshold in some cases.
    • eviction-minimum-reclaim: indicates the minimum number of resources that must be reclaimed in each eviction. This parameter can prevent kubelet from repeatedly evicting pods because only a small number of resources are reclaimed during pod evictions in some cases.

Fault Locating

If the pods are not evicted when the node is faulty, perform the following steps to locate the fault:

After the following command is executed, the command output shows that many pods are in the Evicted state.

kubectl get pods
Check results will be recorded in kubelet logs of the node. You can run the following command to search for the information:
cat /var/log/cce/kubernetes/kubelet.log | grep -i Evicted -C3

Check Item 1: Whether the Node Is Under Resource Pressure

If a node suffers resource pressure, kubelet will change the node status and add taints to the node. Perform the following steps to check whether the corresponding taint exists on the node:

$ kubectl describe node 192.168.0.37
Name:               192.168.0.37
...
Taints:             key1=value1:NoSchedule
...
Table 1 Statuses of nodes with resource pressure and solutions

Node Status

Taint

Eviction Signal

Description

Solution

MemoryPressure

node.kubernetes.io/memory-pressure

memory.available

The available memory on the node reaches the eviction thresholds.

You can scale out node specifications. For details, see How Do I Change the Node Specifications in a CCE Cluster?

DiskPressure

node.kubernetes.io/disk-pressure

nodefs.available, nodefs.inodesFree, imagefs.available or imagefs.inodesFree

The available disk space and inode on the root file system or image file system of the node reach the eviction thresholds.

You can expand the storage space of the node. For details, see Expanding the Storage Space.

PIDPressure

node.kubernetes.io/pid-pressure

pid.available

The available process identifier on the node is below the eviction thresholds.

You can modify the upper limit of PIDs on the node. For details, see Changing Process ID Limits (kernel.pid_max).

Check Item 2: Whether Tolerations Have Been Configured for the Workload

Use kubectl or locate the row containing the target workload and choose More > Edit YAML in the Operation column to check whether tolerance is configured for the workload. For details, see Taints and Tolerations.

Check Item 3: Whether the Conditions for Stopping Pod Eviction Are Met

In a cluster that runs fewer than 50 worker nodes, if the number of faulty nodes accounts for over 55% of the total nodes, the pod eviction will be suspended. In this case, Kubernetes will not attempt to evict the workload on the faulty node. For details, see Rate limits on eviction.

Check Item 4: Whether the Allocated Resources of the Pod Are the Same as Those of the Node

An evicted pod will be frequently scheduled to the original node.

Possible Causes

Pods on a node are evicted based on the node resource usage. The evicted pods are scheduled based on the allocated node resources. Eviction and scheduling are based on different rules. Therefore, an evicted container may be scheduled to the original node again.

Solution

Properly allocate resources to each container.

Check Item 5: Whether the Workload Pod Fails Continuously and Is Redeployed

A workload pod fails and is being redeployed constantly.

Analysis

After a pod is evicted and scheduled to a new node, if pods in that node are also being evicted, the pod will be evicted again. Pods may be evicted repeatedly.

If a pod is evicted by kube-controller-manager, it would be in the Terminating state. This pod will be automatically deleted only after the node where the container is located is restored. If the node has been deleted or cannot be restored due to other reasons, you can forcibly delete the pod.

If a pod is evicted by kubelet, it would be in the Evicted state. This pod is only used for subsequent fault locating and can be directly deleted.

Solution

Run the following command to delete the evicted pods:

kubectl get pods <namespace> | grep Evicted | awk '{print $1}' | xargs kubectl delete pod <namespace> 

In the preceding command, <namespace> indicates the namespace name. Configure it based on your requirements.

Submitting a Service Ticket

If the problem persists, submit a service ticket.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback