Updated on 2024-04-18 GMT+08:00

Health Check

ELB periodically sends requests to backend servers to check whether they can process requests. This process is called health check.

If a backend server is detected unhealthy, the load balancer will stop route requests to it. After the backend server recovers, the load balancer will resume routing requests to it.

If backend servers have to handle large number of requests, frequent health checks may overload the backend servers and cause them to respond slowly. To address this problem, you can prolong the health check interval or use TCP or UDP instead of HTTP. You can also disable health check. If you choose to disable health check, requests may be routed to unhealthy servers, and service interruptions may occur.

Health Check Protocol

You can configure health checks when configuring backend server groups. Generally, you can use the default setting or select a different health check protocol as you need.

If you want to modify health check settings, see details in Modifying Health Check Settings.

Select a health check protocol that matches the backend protocol as described in Table 1.

Table 1 The backend protocol and health check protocols (dedicated load balancers)

Backend Protocol

Health Check Protocol

TCP

TCP, HTTP, or HTTPS

UDP

UDP

QUIC

UDP

HTTP

TCP, HTTP, or HTTPS

HTTPS

TCP, HTTP, or HTTPS

TCP Health Check

For TCP, HTTP, and HTTPS backend protocols, you can use TCP to initiate three-way handshakes to obtain the statuses of backend servers.

Figure 1 TCP health check

The TCP health check process is as follows:

  1. The load balancer sends a TCP SYN packet to the backend server (in the format of {Private IP address}:{Health check port}).
  2. The backend server returns an SYN-ACK packet.
    • If the load balancer does not receive the SYN-ACK packet within the timeout duration, it declares that the backend server is unhealthy and sends an RST packet to the backend server to terminate the TCP connection.
    • If the load balancer receives the SYN-ACK packet from the backend server within the timeout duration, it sends an ACK packet to the backend server and declares that the backend server is healthy. After that, the load balancer sends an RST packet to the backend server to terminate the TCP connection.

After a successful TCP three-way handshake, an RST packet will be sent to close the TCP connection. The application on the backend server may consider this packet a connection error and reply with a message, for example, "Connection reset by peer". To avoid this issue, take either of the following actions:

UDP Health Check

For UDP backend protocol, ELB sends ICMP and UDP probe packets to backend servers to check their health.

Figure 2 UDP health check

The UDP health check process is as follows:

  1. The load balancer sends an ICMP Echo Request packet to the backend server.
    • If the load balancer does not receive an ICMP Echo Reply packet within the health check timeout duration, the backend server is declared unhealthy.
    • If the load balancer receives an ICMP Echo Reply packet within the timeout period, it sends a UDP probe packet to the backend server.
  2. If the load balancer does not receive an ICMP Port Unreachable error within the health check timeout duration, it declares the backend server is healthy. If the load balancer receives an ICMP Port Unreachable error, the backend server is declared unhealthy.

HTTP Health Check

You can also configure HTTP health checks to obtain server statuses through HTTP GET requests if you select TCP, HTTP, or HTTPS as the backend protocol. Figure 3 shows how an HTTP health check works.

Figure 3 HTTP health check

The HTTPS health check process is as follows:

  1. The load balancer sends an HTTP GET request to the backend server (in format of {Private IP address}:{Health check port}/{Health check path}). (You can specify a domain name when configuring a health check.)
  2. The backend server returns an HTTP status code to ELB.
    • If the load balancer receives the status code within the health check timeout duration, it compares the status code with the preset one. If the status codes are the same, the backend server is declared healthy.
    • If the load balancer does not receive any response from the backend server within the health check timeout duration, it declares the backend server is unhealthy.

HTTPS Health Check

For TCP, HTTP, and HTTPS backend protocols, you can use HTTPS to establish an SSL connection over TLS handshakes to obtain the statuses of backend servers. Figure 4 shows how an HTTPS health check works.

Figure 4 HTTPS health check

The HTTPS health check process is as follows:

  1. The load balancer sends a Client Hello packet to establish an SSL connection with the backend server.
  2. After receiving the Server Hello packet from the backend server, the load balancer sends an encrypted HTTP GET request to the backend server (in the format of {Private IP address}:{Health check port}/{Health check path}). (You can specify a domain name when configuring a health check.)
  3. The backend server returns an HTTP status code to the load balancer.
    • If the load balancer receives the status code within the health check timeout duration, it compares the status code with the preset one. If the status codes are the same, the backend server is declared healthy.
    • If the load balancer does not receive any response from the backend server within the health check timeout duration, it declares the backend server is unhealthy.

Health Check Time Window

Health checks greatly improve service availability. However, if health checks are too frequent, service availability will be compromised. To avoid the impact, ELB declares a backend server healthy or unhealthy after several consecutive health checks.

The health check time window is determined by the factors in Table 2:

Table 2 Factors affecting the health check time window

Factor

Description

Check Interval

How often health checks are performed.

Timeout Duration

How long the load balancer waits for the response from the backend server.

Health Check Threshold

The number of consecutive successful or failed health checks required for determining whether the backend server is healthy or unhealthy.

The following is a formula for you to calculate the health check time window:

  • Time window for a backend server to be detected healthy = Timeout duration x Healthy threshold + Interval x (Healthy threshold – 1)
  • Time window for a backend server to be detected unhealthy = Timeout duration x Unhealthy threshold + Interval x (Unhealthy threshold – 1)
As shown in Figure 5, if the health check interval is 4s, the health check timeout duration is 2s, and unhealthy threshold is 3, the time window for a backend server to be considered unhealthy is calculated as follows: 2 x 3 + 4 x (3 – 1) = 14s.
Figure 5 Health check timeout duration

Rectifying an Unhealthy Backend Server

If a backend server is detected unhealthy, see How Do I Troubleshoot an Unhealthy Backend Server?