Liveness Probes

Overview of Liveness Probes

Kubernetes enables self-healing for applications. If an application container crashes, it is automatically restarted. However, this mechanism does not address deadlocks. For example, if a Java program has a memory leak, it may become unresponsive while the JVM process continues running. To handle such scenarios, Kubernetes uses liveness probes. These probes check whether containers are responding normally and determine whether they need to be restarted. This is an effective health check strategy.

A liveness probe should be defined for each pod to help Kubernetes keep track of pod statuses.

Kubernetes supports the following detection methods:

HTTP GET: The kubelet sends an HTTP GET request to the container. If the application returns a 2xx or 3xx status code, the container is considered healthy.
TCP Socket: The kubelet attempts to establish a TCP connection to the target container on a specified port. If the connection is successful, the container is considered healthy. Otherwise, it is considered unhealthy.
Exec: The kubelet executes a command inside the container. If the command exits with a status code of 0, the container is considered healthy. If it exits with a non-zero status code, the container is considered unhealthy.

In addition to liveness probes, readiness probes also check pod statuses. For details, see Readiness Probes.

HTTP GET

HTTP GET is the most common detection method. In this mode, the kubelet sends an HTTP GET request to the target container. If the application returns a 2xx or 3xx status code, the container is considered healthy. The following shows an example:

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    livenessProbe:           # A liveness probe
      httpGet:               # HTTP GET definition
        path: /
        port: 80
  imagePullSecrets: 
  - name: default-secret

Create the pod.

$ kubectl create -f liveness-http.yaml
pod/liveness-http created

The kubelet sends an HTTP GET request to port 80 on the container. If the request fails, Kubernetes will restart the container.

View details of the pod.

$ kubectl describe po liveness-http
Name:               liveness-http
......
Containers:
  liveness:
    ......
    State:          Running
      Started:      Mon, 03 Aug 2020 03:08:55 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vssmw (ro)
......

The above information shows that the pod is Running and its Restart Count is 0, indicating there have not been any container restarts. If Restart Count is not 0, the container has been restarted.

TCP Socket

The kubelet attempts to establish a TCP connection to the target container on a specified port. If the connection is successful, the container is considered healthy. Otherwise, it is considered unhealthy. The following shows an example:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-tcp
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    livenessProbe:           # A liveness probe
      tcpSocket:
        port: 80
  imagePullSecrets: 
  - name: default-secret

Exec

The kubelet executes a command inside the target container. If the command exits with a status code of 0, the container is considered healthy. If it exits with a non-zero status code, the container is considered unhealthy. The following shows an example:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:           # A liveness probe
      exec:                  # Exec definition
        command:
        - cat
        - /tmp/healthy
  imagePullSecrets: 
  - name: default-secret

According to the Exec definition, the kubelet runs cat /tmp/healthy inside the container. If the command exits with a status code of 0, the container is considered healthy. For the first 30 seconds, the /tmp/healthy file exists, causing cat /tmp/healthy to return a success code. After 30 seconds, the file is deleted. The kubelet then considers the pod unhealthy and restarts it.

Advanced Settings of a Liveness Probe

The describe command for liveness-http returns the following information:

Liveness: http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3

Liveness probe parameters are as follows:

delay=0s: The health check starts immediately after the container starts.
timeout=1s: The container must respond within one second. Otherwise, it is recorded as unhealthy.
period=10s: The probe checks the container every 10 seconds.
#success=1: The container is considered healthy if it succeeds once.
#failure=3: The container is restarted after being recorded as unhealthy for three consecutive failures.

This liveness probe starts immediately after the container starts. If the container does not respond within one second, it is recorded as unhealthy. The probe runs every 10 seconds. If the container is recorded as unhealthy for three consecutive times, it is restarted.

These are the default settings when the probe is created. You can customize them as needed.

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    livenessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10    # The health check starts 10 seconds after the container starts.
      timeoutSeconds: 2          # The container must respond within 2 seconds. Otherwise, it is considered unhealthy.
      periodSeconds: 30          # The probe checks the container every 30 seconds.
      successThreshold: 1        # The container is considered healthy if it succeeds once.
      failureThreshold: 3        # The container is considered unhealthy after three consecutive failures.

Typically, the initialDelaySeconds value must be greater than 0 because it takes time for the application to become ready after the container starts. If the probe is initiated before the application is ready, it may fail.

Additionally, the failureThreshold value can be greater than 1. This allows the kubelet to retry the probe multiple times before considering the container unhealthy, rather than failing the probe immediately after the first failure.

Configuring a Liveness Probe

What to check
An effective liveness probe should check all key parts of an application and use a dedicated URL, such as /health. When accessed, this URL triggers the probe and returns a result. Note that no authentication should be involved. Otherwise, the probe will keep failing and restarting the container.

Additionally, a probe should not check parts with external dependencies. For example, if a frontend web server cannot access a database, the web server should not be considered unhealthy due to the connection failure.
To be lightweight
A liveness probe must not consume too many resources or hold certain resources for too long, as this could lead to resource shortages and affect service performance. For example, the HTTP GET method is recommended for Java applications. Using the Exec method might cause the JVM startup process to consume excessive resources.

Parent topic: Pods, Liveness Probes, Labels, and Namespaces

Previous topic: Pods

Next topic: Labels