What Do I Do If a Containerized Application Fails to Be Started on an Edge Node?
Symptom
A containerized application cannot be started on an edge node.
Fault Locating
Troubleshooting methods are sorted based on the occurrence probability of the possible causes. You are advised to check the possible causes from high probability to low probability to quickly locate the cause of the problem.
Possible Cause |
Solution |
---|---|
The containerized application fails to be delivered to the edge node. |
For details, see What Do I Do If an Application Fails to Be Delivered to an Edge Node?. |
The containerized application is incorrectly configured. |
|
The container image cannot be pulled to the edge node. |
For details, see What Do I Do If a Container Image Fails to Be Pulled?. |
Containerized Application Is Incorrectly Configured
- Log in to the edge node.
- Run the following command to check whether the container is running:
sudo docker ps | grep Application name
Run the following command to check whether the container exits abnormally:
sudo docker ps -a | grep Application name
Run the preceding two commands repeatedly to check whether the container keeps restarting.
- If the status of your application cannot be queried, go to 3.
- If your container restarts repeatedly, run the following command to query logs:
ID=`sudo docker ps -a | grep Application name | awk '{print $1}' `
sudo docker logs $ID
The application logs are displayed, based on which you can locate the cause of repeated container restarts. The possible causes are as follows:
- Image errors
The image is in error, or the image does not match the system. You can perform the following operations to verify the image on the edge node:
- Startup parameter errors
- Directory mounting errors
If the image needs to access a special directory on the edge node, ensure that the directory has been mounted during the delivery.
- NPU issues
If your application needs to use NPU resources, ensure that you have selected the NPU resources when delivering the application.
NPU resources are occupied by applications that are not delivered by IEF, resulting in insufficient resources. IEF cannot identify the NPU usage of non-IEF applications. Therefore, check that NPU resources are sufficient.
- Resource issues
Ensure that the Limit value of the CPU and memory resources requested when the application is delivered are sufficient. (If the amount of resources requested by the container exceeds the Limit value, the container will be killed repeatedly.) You can conduct verification by setting Limit to a larger value.
- Health check issues
If you have configured the health check, ensure that the health check mode is correctly configured. If the health check mode is incorrectly configured, the health check will fail and the container will be restarted repeatedly.
Log in to the IEF console, choose Edge Applications > Containerized Applications, and click the name of your application. On the details page that is displayed, click the Upgrade tab, and choose Health Check under Advanced Settings to check whether the liveness probe and readiness probe of your application are correctly configured.
To verify this problem, you can update the application without configuring the health check and check whether the application restarts repeatedly.
- Health check interval issues
Check how long it takes for the application to start properly and how long it takes for the system to return health check results.
Figure 2 Health check configurations
The health check delay indicates the interval between the time when the application is delivered and the time when a health check is started. If the interval is too short, a health check may start before the application is ready. In this case, the application fails the health check continuously and the container is restarted repeatedly, resulting in a vicious cycle.
The health check timeout indicates the interval between the time when the health check is started and the time when a response is returned. If no response is returned within the interval, the health check is counted as failed. If the configured health check timeout period is shorter than the time required for the interface to return the result, the health check fails continuously and the application is restarted repeatedly. (This problem may occur when the edge node performance is poor or the service volume on the application is large.)
- Image errors
- Check whether the application is successfully delivered.
- Run the following command to switch to the root user:
sudo su
- Query application logs.
If logs are displayed, the application has been successfully delivered. The possible cause is that the container image fails to be pulled. Locate the fault by referring to What Do I Do If a Container Image Fails to Be Pulled?.
If no log is displayed, submit a service ticket.
- Run the following command to switch to the root user:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot