Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

What Should I Do If Container Startup Fails?

Updated on 2024-12-04 GMT+08:00

Fault Locating

On the details page of a workload, if an event is displayed indicating that the container fails to be started, perform the following steps to locate the fault:

  1. Log in to the node where the abnormal workload is located.
  2. Check the ID of the container where the workload pod exits abnormally.

    docker ps -a | grep $podName

  3. View the logs of the corresponding container.

    docker logs $containerID

    Rectify the fault of the workload based on logs.

  4. Check the error logs.

    cat /var/log/messages | grep $containerID  | grep oom

    Check whether the system OOM is triggered based on the logs.

Troubleshooting Process

Determine the cause based on the event information, as listed in Table 1.

Table 1 Container startup failure

Log or Event

Cause and Solution

The log contains exit(0).

No process exists in the container.

Check whether the container is running properly.

Check Item 1: Whether There Are Processes that Keep Running in the Container (Exit Code: 0)

Event information: Liveness probe failed: Get http...

The log contains exit(137).

Health check fails.

Check Item 2: Whether Health Check Fails to Be Performed (Exit Code: 137)

Event information:

Thin Pool has 15991 free data blocks which are less than minimum required 16383 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

The disk space is insufficient. Clear the disk space.

Check Item 3: Whether the Container Disk Space Is Insufficient

The keyword OOM exists in the log.

The memory is insufficient.

Check Item 4: Whether the Upper Limit of Container Resources Has Been Reached

Check Item 5: Whether the Resource Limits Are Improperly Configured for the Container

Address already in use

A conflict occurs between container ports in the pod.

Check Item 6: Whether the Container Ports in the Same Pod Conflict with Each Other

Error: failed to start container "filebeat": Error response from daemon: OCI runtime create failed: container_linux.go:330: starting container process caused "process_linux.go:381: container init caused \"setenv: invalid argument\"": unknown

A secret is mounted to the workload, and the value of the secret is not encrypted using Base64.

Check Item 7: Whether the Value of the Secret Mounted to the Workload Meets Requirements

In addition to the preceding possible causes, there are some other possible causes:

Figure 1 Troubleshooting process of the container restart failure

Check Item 1: Whether There Are Processes that Keep Running in the Container (Exit Code: 0)

  1. Log in to the node where the abnormal workload is located.
  2. View the container status.

    docker ps -a | grep $podName

    Example:

    If no running process exists in the container, the status code Exited (0) is displayed.

Check Item 2: Whether Health Check Fails to Be Performed (Exit Code: 137)

The health check configured for a workload is performed on services periodically. If an exception occurs, the pod reports an event and the pod fails to be restarted.

If the liveness-type (workload liveness probe) health check is configured for the workload and the number of health check failures exceeds the threshold, the containers in the pod will be restarted. On the workload details page, if Kubernetes events contain Liveness probe failed: Get http..., the health check fails.

Solution

Click the workload name to go to the workload details page, click the Containers tab. Then select Health Check to check whether the policy is proper or whether services are running properly.

Check Item 3: Whether the Container Disk Space Is Insufficient

The following message refers to the thin pool disk that is allocated from the Docker disk selected during node creation. You can run the lvs command as user root to view the current disk usage.

Thin Pool has 15991 free data blocks which are less than minimum required 16383 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

Solution

Solution 1: Clearing images

Perform the following operations to clear unused images:
  • Nodes that use containerd
    1. Obtain local images on the node.
      crictl images -v
    2. Delete the images that are not required by image ID.
      crictl rmi Image ID
  • Nodes that use Docker
    1. Obtain local images on the node.
      docker images
    2. Delete the images that are not required by image ID.
      docker rmi Image ID
NOTE:

Do not delete system images such as the cce-pause image. Otherwise, pods may fail to be created.

Solution 2: Expanding the disk capacity

To expand a disk capacity, perform the following steps:

  1. Expand the capacity of the data disk on the EVS console.
  2. Log in to the CCE console and click the cluster. In the navigation pane, choose Nodes. Click More > Sync Server Data in the row containing the target node.
  3. Log in to the target node.
  4. Run the lsblk command to check the block device information of the node.

    A data disk is divided depending on the container storage Rootfs:

    • Overlayfs: No independent thin pool is allocated. Image data is stored in the dockersys disk.
      # lsblk
      NAME                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
      vda                   8:0    0   50G  0 disk 
      └─vda1                8:1    0   50G  0 part /
      vdb                   8:16   0  200G  0 disk 
      ├─vgpaas-dockersys  253:0    0   90G  0 lvm  /var/lib/docker               # Space used by the container engine
      └─vgpaas-kubernetes 253:1    0   10G  0 lvm  /mnt/paas/kubernetes/kubelet  # Space used by Kubernetes

      Run the following commands on the node to add the new disk capacity to the dockersys disk:

      pvresize /dev/vdb 
      lvextend -l+100%FREE -n vgpaas/dockersys
      resize2fs /dev/vgpaas/dockersys
    • Devicemapper: A thin pool is allocated to store image data.
      # lsblk
      NAME                                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
      vda                                   8:0    0   50G  0 disk 
      └─vda1                                8:1    0   50G  0 part /
      vdb                                   8:16   0  200G  0 disk 
      ├─vgpaas-dockersys                  253:0    0   18G  0 lvm  /var/lib/docker    
      ├─vgpaas-thinpool_tmeta             253:1    0    3G  0 lvm                   
      │ └─vgpaas-thinpool                 253:3    0   67G  0 lvm                   # Space used by thinpool
      │   ...
      ├─vgpaas-thinpool_tdata             253:2    0   67G  0 lvm  
      │ └─vgpaas-thinpool                 253:3    0   67G  0 lvm  
      │   ...
      └─vgpaas-kubernetes                 253:4    0   10G  0 lvm  /mnt/paas/kubernetes/kubelet
      • Run the following commands on the node to add the new disk capacity to the thinpool disk:
        pvresize /dev/vdb 
        lvextend -l+100%FREE -n vgpaas/thinpool
      • Run the following commands on the node to add the new disk capacity to the dockersys disk:
        pvresize /dev/vdb 
        lvextend -l+100%FREE -n vgpaas/dockersys
        resize2fs /dev/vgpaas/dockersys

Check Item 4: Whether the Upper Limit of Container Resources Has Been Reached

If the upper limit of container resources has been reached, OOM will be displayed in the event details as well as in the log:

cat /var/log/messages | grep 96feb0a425d6 | grep oom

When a workload is created, if the requested resources exceed the configured upper limit, the system OOM is triggered and the container exits unexpectedly.

Check Item 5: Whether the Resource Limits Are Improperly Configured for the Container

If the resource limits set for the container during workload creation are less than required, the container fails to be restarted.

Check Item 6: Whether the Container Ports in the Same Pod Conflict with Each Other

  1. Log in to the node where the abnormal workload is located.
  2. Check the ID of the container where the workload pod exits abnormally.

    docker ps -a | grep $podName

  3. View the logs of the corresponding container.

    docker logs $containerID

    Rectify the fault of the workload based on logs. As shown in the following figure, container ports in the same pod conflict. As a result, the container fails to be started.

    Figure 2 Container restart failure due to a container port conflict

Solution

Re-create the workload and set a port number that is not used by any other pod.

Check Item 7: Whether the Value of the Secret Mounted to the Workload Meets Requirements

Information similar to the following is displayed in the event:

Error: failed to start container "filebeat": Error response from daemon: OCI runtime create failed: container_linux.go:330: starting container process caused "process_linux.go:381: container init caused \"setenv: invalid argument\"": unknown

The root cause is that a secret is mounted to the workload, but the value of the secret is not encrypted using Base64.

Solution

Create a secret on the console. The value of the secret is automatically encrypted using Base64.

If you use YAML to create a secret, you need to manually encrypt its value using Base64.

# echo -n "Content to be encoded" | base64

Check Item 8: Whether the Container Startup Command Is Correctly Configured

The error messages are as follows:

Solution

Click the workload name to go to the workload details page, click the Containers tab. Choose Lifecycle, click Startup Command, and ensure that the command is correct.

Check Item 9: Whether the User Service Has a Bug

Check whether the workload startup command is correctly executed or whether the workload has a bug.

  1. Log in to the node where the abnormal workload is located.
  2. Check the ID of the container where the workload pod exits abnormally.

    docker ps -a | grep $podName

  3. View the logs of the corresponding container.

    docker logs $containerID

    Note: In the preceding command, containerID indicates the ID of the container that has exited.

    Figure 3 Incorrect startup command of the container

    As shown in the figure above, the container fails to be started due to an incorrect startup command. For other errors, rectify the bugs based on the logs.

Solution

Create a new workload and configure a correct startup command.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback