Help Center> Cloud Container Engine> FAQ> Workload> Workload Abnormalities> What Should I Do If Container Startup Fails?

What Should I Do If Container Startup Fails?

Fault Locating

On the details page of a workload, if an event is displayed indicating that the container fails to be started, perform the following steps to locate the fault:

Log in to the node where the abnormal workload is located.
Check the ID of the container where the workload pod exits abnormally.
```
docker ps -a | grep $podName
```
View the logs of the corresponding container.
```
docker logs $containerID
```
Rectify the fault of the workload based on logs.
Check the error logs.
```
cat /var/log/messages | grep $containerID  | grep oom
```
Check whether the system OOM is triggered based on the logs.

Troubleshooting Process

Determine the cause based on the event information, as listed in Table 1.

**Table 1** Container startup failure
Log or Event	Cause and Solution
The log contains exit(0).	No process exists in the container. Check whether the container is running properly. Check Item 1: Whether There Are Processes that Keep Running in the Container (Exit Code: 0)
Event information: Liveness probe failed: Get http... The log contains exit(137).	Health check fails. Check Item 2: Whether Health Check Fails to Be Performed (Exit Code: 137)
Event information: Thin Pool has 15991 free data blocks which is less than minimum required 16383 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior	The disk space is insufficient. Clear the disk space. Check Item 3: Whether the Container Disk Space Is Insufficient
The keyword OOM exists in the log.	The memory is insufficient. Check Item 4: Whether the Upper Limit of Container Resources Has Been Reached Check Item 5: Whether the Resource Limits Are Improperly Set for the Container
Address already in use	A conflict occurs between container ports in the pod. Check Item 6: Whether the Container Ports in the Same Pod Conflict with Each Other

In addition to the preceding possible causes, there are other three possible causes:

Check Item 7: Whether the Container Startup Command Is Correctly Configured
Check Item 8: Whether the Java Probe Version Is latest
Check Item 9: Whether the User Service Has a Bug

Figure 1 Troubleshooting process
Click to enlarge

Check Item 1: Whether There Are Processes that Keep Running in the Container (Exit Code: 0)

Log in to the node where the abnormal workload is located.
View the container status.
```
docker ps -a | grep $podName
```
Example:

If no running process exists in the container, the status code Exited (0) is displayed.

Check Item 2: Whether Health Check Fails to Be Performed (Exit Code: 137)

The health check configured for a workload is performed on services periodically. If an exception occurs, the pod reports an event and the pod fails to be restarted.

If the liveness-type (workload liveness probe) health check is configured for the workload and the number of health check failures exceeds the threshold, the containers in the pod will be restarted. On the workload details page, if Kubernetes events contain Liveness probe failed: Get http..., the health check fails.

Solution

On the workload details page, choose Upgrade > Advanced Settings > Health Check to check whether the health check policy is properly set and whether services are normal.

Check Item 3: Whether the Container Disk Space Is Insufficient

The following message refers to the Thin Pool disk that is allocated from the Docker disk selected during node creation. You can run the lvs command as user root to view the current disk usage.

Thin Pool has 15991 free data blocks which are less than minimum required 16383 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

Click to enlarge

Solution

Release used disk space.

docker rmi -f `docker images | grep myhuaweicloud | awk '{print $3}'`

Expand the disk capacity. For details about how to expand the data disk capacity of a node, see Node Data Disk (Dedicated for Docker).

Check Item 4: Whether the Upper Limit of Container Resources Has Been Reached

If the upper limit of container resources has been reached, OOM will be displayed in the event details as well as in the log:

cat /var/log/messages | grep 96feb0a425d6 | grep oom

Click to enlarge

When a workload is created, if the requested resources exceed the configured upper limit, the system OOM is triggered and the container exits unexpectedly.

Check Item 5: Whether the Resource Limits Are Improperly Set for the Container

If the resource limits set for the container during workload creation are less than required, the container fails to be restarted.

Solution

Modify the container specifications of the workload. For details, see How Do I Set the Upper and Lower Limits of CPU and Memory Resources for a Container?.

Check Item 6: Whether the Container Ports in the Same Pod Conflict with Each Other

Log in to the node where the abnormal workload is located.
Check the ID of the container where the workload pod exits abnormally.

docker ps -a | grep $podName
View the logs of the corresponding container.

docker logs $containerID

Rectify the fault of the workload based on logs. As shown in the following figure, container ports in the same pod conflict. As a result, the container fails to be started.

Figure 2 Container restart failure due to a container port conflict

Solution

Re-create the workload and set a port number that is not used by any other pod.

Check Item 7: Whether the Container Startup Command Is Correctly Configured

The error messages are as follows:

Click to enlarge

Solution

Log in to the CCE console. On the workload details page, choose Upgrade > Advanced Settings > Lifecycle to check whether the startup command is correctly configured.

Check Item 8: Whether the Java Probe Version Is latest

The following figure shows the information of the Kubernetes event "Created container init-pinpoint".

Click to enlarge

Solution

When creating a Deployment or creating a StatefulSet, select the latest Java probe version (for example, 1.0.36, not the version latest) in the APM Settings area on the Advanced Settings page.
If the Java probe of the latest version has been selected during workload creation, click the workload name in the workload list. On the workload details page that is displayed, click the Workload O&M tab, click Edit under APM Settings, and select the latest Java probe version (for example, 1.0.36).

Check Item 9: Whether the User Service Has a Bug

Check whether the workload startup command is correctly executed or whether the workload has a bug.

Log in to the node where the abnormal workload is located.
Check the ID of the container where the workload pod exits abnormally.
```
docker ps -a | grep $podName
```
View the logs of the corresponding container.
```
docker logs $containerID
```
Note: In the preceding command, containerID indicates the ID of the container that has exited.

Figure 3 Incorrect startup command of the container

As shown in the figure above, the container fails to be started due to an incorrect startup command. For other errors, rectify the bugs based on the logs.

Solution

Create a new workload and configure a correct startup command.

Parent topic: Workload Abnormalities

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

What Should I Do If Container Startup Fails?

Fault Locating

Troubleshooting Process

Check Item 1: Whether There Are Processes that Keep Running in the Container (Exit Code: 0)

Check Item 2: Whether Health Check Fails to Be Performed (Exit Code: 137)

Check Item 3: Whether the Container Disk Space Is Insufficient

Check Item 4: Whether the Upper Limit of Container Resources Has Been Reached

Check Item 5: Whether the Resource Limits Are Improperly Set for the Container

Check Item 6: Whether the Container Ports in the Same Pod Conflict with Each Other

Check Item 7: Whether the Container Startup Command Is Correctly Configured

Check Item 8: Whether the Java Probe Version Is latest

Check Item 9: Whether the User Service Has a Bug