Liveness Probe
Overview
Kubernetes provides the self-healing capability, that is, Kubernetes can detect the container crash and restart the container. However, sometimes memory leakage occurs in a Java program, and the program cannot work normally, while the JVM process is still running. For such issues, Kubernetes provides the liveness probe mechanism to determine whether to restart the container by checking whether the container responses normally. This is a good health check mechanism.
A liveness probe should be defined for each pod. Otherwise, Kubernetes cannot detect whether the pod is running properly.
CCI supports the following detection mechanisms:
- HTTP GET: An HTTP GET request is sent to the container. If the probe receives 2xx or 3xx, the container is healthy.
You need to configure the following annotation for the pod to make timeoutSeconds take effect:
cci.io/httpget-probe-timeout-enable:"true"
For details, see the example in Advanced Configuration of Liveness Probe.
- Exec: The probe runs a command in the container and checks the exit status code. If the exit status code is 0, the probe is healthy.
HTTP GET
HTTP GET is the most common detection method. The mechanism is to send an HTTP GET request to the container. If the probe receives 2xx or 3xx, the container is healthy. The method is defined as follows:
apiVersion: v1 kind: Pod metadata: name: liveness-http spec: containers: - name: liveness image: k8s.gcr.io/liveness args: - /server livenessProbe: # liveness probe httpGet: # HTTP GET definition path: /healthz port: 8080
Create a pod.
$ kubectl create -f liveness-http.yaml -n $namespace_name pod/liveness-http created
As shown above, the probe sends an HTTP GET request to port 8080 of the container. The preceding program returns the status code 500 for the fifth request. Then Kubernetes restarts the container.
View pod details.
$ kubectl describe po liveness-http -n $namespace_name Name: liveness-http ...... Containers: container-0: ...... State: Running Started: Mon, 12 Nov 2018 22:57:28 +0800 Last State: Terminated Reason: Error Exit Code: 137 Started: Mon, 12 Nov 2018 22:55:40 +0800 Finished: Mon, 12 Nov 2018 22:57:27 +0800 Ready: True Restart Count: 1 Liveness: http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3 ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m5s default-scheduler Successfully assigned default/pod-liveness to node2 Normal Pulling 74s (x2 over 3m4s) kubelet, node2 pulling image "pod-liveness" Normal Killing 74s kubelet, node2 Killing container with id docker://container-0:Container failed liveness probe.. Container will be killed and recreated.
As shown, the pod is in the Running state, the Last State is Terminated, and the Restart Count is 1, indicating that the pod is restarted once. In addition, you can see the following information from the event "Killing container with id docker://container-0:Container failed liveness probe.." Container will be killed and recreated.
After the container is killed, a new container is created.
Exec
Exec is to execute a specific command. The mechanism is that the probe executes the command in the container and checks the exit status code of the command. If the status code is 0, the pod is healthy. The method is defined as follows:
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 livenessProbe: # liveness probe exec: # Exec definition command: - cat - /tmp/healthy
Run the cat /tmp/healthy command in the container. If the command is executed successfully and 0 is returned, the container is healthy.
Advanced Configuration of Liveness Probe
In output of the $ kubectl describe po liveness-http command, the following information is displayed:
Liveness: http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3
This line indicates the parameter configuration of the liveness probe. The meanings of the parameters are as follows:
- delay=0s indicates that the probe starts immediately after the container is started.
- timeout=1s indicates that the container must respond to the probe within 1s. Otherwise, the detection fails.
- period=10s indicates that the detection is performed every 10s.
- #success=1 indicates that the detection is successful after succeeding once.
- #failure=3 indicates that the container will be restarted after three consecutive detection failures.
These are set by default when the probe is created. You can also manually configure the parameters as follows:
apiVersion: v1 kind: Pod metadata: name: liveness-http spec: template: metadata: annotations: cci.io/httpget-probe-timeout-enable:"true" containers: - image: k8s.gcr.io/liveness livenessProbe: httpGet: path: / port: 8080 initialDelaySeconds: 10 # When does the container start detection after the container is started? timeoutSeconds: 2 # The container must respond to the probe within 2s, or the detection fails. periodSeconds: 30 # The probe is performed every 30s. successThreshold: 1 # The container is considered healthy as long as the probe succeeds once. failureThreshold: 3 # The container will be restarted after three consecutive detection failures.
Generally, the value of initialDelaySeconds must be greater than 0, because in most cases, although the container is started successfully, it takes a while for the application to be ready. After the application is ready, a success message is returned. Otherwise, the probe may fail frequently.
In addition, you can set failureThreshold to allow multiple times of loop detection, so that you do not have to repeatedly run the health check program.
Configuring an Effect Liveness Probe
- What should a liveness probe detect?
A liveness probe should check whether all the key parts of an application are healthy and use a dedicated URL, such as /health. This function is performed when /health is accessed, and then the result is returned. Note that authentication cannot be performed. Otherwise, the probe will repeatedly fail and be restarted.
In addition, the check can be performed only within the application, and cannot be performed outside the dependency. For example, if the frontend web server cannot connect to the database, the web server cannot be considered as unhealthy.
- A liveness probe must be lightweight.
A liveness probe cannot occupy too many resources or too much time. Otherwise, the health check is wasting resources. For example, for Java applications, the HTTP GET method is recommended. If the Exec method is used, the JVM startup occupies too many resources.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot