Liveness Probe

Overview

Kubernetes provides the self-healing capability, that is, Kubernetes can detect the container crash and restart the container. However, sometimes memory leakage occurs in a Java program, and the program cannot work normally, while the JVM process is still running. For such issues, Kubernetes provides the liveness probe mechanism to determine whether to restart the container by checking whether the container responses normally. This is a good health check mechanism.

A liveness probe should be defined for each pod. Otherwise, Kubernetes cannot detect whether the pod is running properly.

CCI supports the following detection mechanisms:

HTTP GET: An HTTP GET request is sent to the container. If the probe receives 2xx or 3xx, the container is healthy.

You need to configure the following annotation for the pod to make timeoutSeconds take effect:

cci.io/httpget-probe-timeout-enable:"true"

For details, see the example in Advanced Configuration of Liveness Probe.
Exec: The probe runs a command in the container and checks the exit status code. If the exit status code is 0, the probe is healthy.

HTTP GET

HTTP GET is the most common detection method. The mechanism is to send an HTTP GET request to the container. If the probe receives 2xx or 3xx, the container is healthy. The method is defined as follows:

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:           # liveness probe
      httpGet:               # HTTP GET definition
        path: /healthz
        port: 8080

Create a pod.

$ kubectl create -f liveness-http.yaml -n $namespace_name
pod/liveness-http created

As shown above, the probe sends an HTTP GET request to port 8080 of the container. The preceding program returns the status code 500 for the fifth request. Then Kubernetes restarts the container.

View pod details.

$ kubectl describe po liveness-http -n $namespace_name
Name:         liveness-http
......
Containers:
  container-0:
    ......
    State:          Running
      Started:      Mon, 12 Nov 2018 22:57:28 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 12 Nov 2018 22:55:40 +0800
      Finished:     Mon, 12 Nov 2018 22:57:27 +0800
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3
......
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  3m5s                default-scheduler  Successfully assigned default/pod-liveness to node2
  Normal   Pulling    74s (x2 over 3m4s)  kubelet, node2     pulling image "pod-liveness"
  Normal   Killing    74s                 kubelet, node2     Killing container with id docker://container-0:Container failed liveness probe.. Container will be killed and recreated.

As shown, the pod is in the Running state, the Last State is Terminated, and the Restart Count is 1, indicating that the pod is restarted once. In addition, you can see the following information from the event "Killing container with id docker://container-0:Container failed liveness probe.." Container will be killed and recreated.

After the container is killed, a new container is created.

Exec

Exec is to execute a specific command. The mechanism is that the probe executes the command in the container and checks the exit status code of the command. If the status code is 0, the pod is healthy. The method is defined as follows:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:           # liveness probe
      exec:                  # Exec definition
        command:
        - cat
        - /tmp/healthy

Run the cat /tmp/healthy command in the container. If the command is executed successfully and 0 is returned, the container is healthy.

Advanced Configuration of Liveness Probe

In output of the $ kubectl describe po liveness-http command, the following information is displayed:

Liveness: http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3

This line indicates the parameter configuration of the liveness probe. The meanings of the parameters are as follows:

delay=0s indicates that the probe starts immediately after the container is started.
timeout=1s indicates that the container must respond to the probe within 1s. Otherwise, the detection fails.
period=10s indicates that the detection is performed every 10s.
#success=1 indicates that the detection is successful after succeeding once.
#failure=3 indicates that the container will be restarted after three consecutive detection failures.

These are set by default when the probe is created. You can also manually configure the parameters as follows:

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  template:
    metadata:
      annotations:
        cci.io/httpget-probe-timeout-enable:"true"
  containers:
  - image: k8s.gcr.io/liveness
    livenessProbe:
      httpGet:
        path: /
        port: 8080
      initialDelaySeconds: 10    # When does the container start detection after the container is started?
      timeoutSeconds: 2          # The container must respond to the probe within 2s, or the detection fails.
      periodSeconds: 30           # The probe is performed every 30s.
      successThreshold: 1        # The container is considered healthy as long as the probe succeeds once.
      failureThreshold: 3        # The container will be restarted after three consecutive detection failures.

Generally, the value of initialDelaySeconds must be greater than 0, because in most cases, although the container is started successfully, it takes a while for the application to be ready. After the application is ready, a success message is returned. Otherwise, the probe may fail frequently.

In addition, you can set failureThreshold to allow multiple times of loop detection, so that you do not have to repeatedly run the health check program.

Configuring an Effect Liveness Probe

What should a liveness probe detect?
A liveness probe should check whether all the key parts of an application are healthy and use a dedicated URL, such as /health. This function is performed when /health is accessed, and then the result is returned. Note that authentication cannot be performed. Otherwise, the probe will repeatedly fail and be restarted.

In addition, the check can be performed only within the application, and cannot be performed outside the dependency. For example, if the frontend web server cannot connect to the database, the web server cannot be considered as unhealthy.
A liveness probe must be lightweight.
A liveness probe cannot occupy too many resources or too much time. Otherwise, the health check is wasting resources. For example, for Java applications, the HTTP GET method is recommended. If the Exec method is used, the JVM startup occupies too many resources.