Updated on 2024-09-20 GMT+08:00

Health Check

ELB periodically sends requests to backend servers to check whether they can process requests. This process is called health check.

If a backend server is detected unhealthy, the load balancer will stop routing requests to it. After the backend server recovers, the load balancer will resume routing requests to it.

If backend servers have to handle large number of requests, frequent health checks may overload the backend servers and cause them to respond slowly. To address this problem, you can prolong the health check interval or use TCP or UDP instead of HTTP. You can also disable health check. If you choose to disable health check, requests may be routed to unhealthy servers, and service interruptions may occur.

Health Check Protocol

You can configure health checks when configuring backend server groups. Generally, you can use the default setting or select a different health check protocol as you need.

If you want to modify health check settings, see details in Configuring a Health Check.

Select a health check protocol that matches the backend protocol as described in Table 1.

Table 1 The backend protocol and health check protocols (dedicated load balancers)

Backend Protocol

Health Check Protocol

TCP

TCP, HTTP, or HTTPS

UDP

UDP

QUIC

UDP

TLS

TCP, HTTP, HTTPS, TLS, or gRPC

HTTP

TCP, HTTP, TLS, gRPC, or HTTPS

HTTPS

TCP, HTTP, TLS, gRPC, or HTTPS

gRPC

TCP, HTTP, TLS, gRPC, or HTTPS

TLS and gRPC are available in certain regions. You can see which regions support them on the console.

Health Check Source IP Address

A dedicated load balancer uses the IP addresses in its backend subnet to send requests to backend servers and verify their health status. To perform health checks, ensure that the security group rules of the backend servers allow access from the backend subnet where the load balancer works. For details, see Security Group and Network ACL Rules.

TCP Health Check

For TCP, HTTP, and HTTPS backend protocols, you can use TCP to initiate three-way handshakes to obtain the statuses of backend servers.

Figure 1 TCP health check

The TCP health check process is as follows:

  1. The load balancer sends a TCP SYN packet to the backend server (in the format of {Private IP address}:{Health check port}).
  2. The backend server returns an SYN-ACK packet.
    • If the load balancer does not receive the SYN-ACK packet within the timeout duration, it declares that the backend server is unhealthy and sends an RST packet to the backend server to terminate the TCP connection.
    • If the load balancer receives the SYN-ACK packet from the backend server within the timeout duration, it sends an ACK packet to the backend server and declares that the backend server is healthy. After that, the load balancer sends an RST packet to the backend server to terminate the TCP connection.

After a successful TCP three-way handshake, an RST packet will be sent to close the TCP connection. The application on the backend server may consider this packet a connection error and reply with a message, for example, "Connection reset by peer". To avoid this issue, take either of the following actions:

UDP Health Check

For UDP backend protocol, ELB sends ICMP and UDP probe packets to backend servers to check their health.

Figure 2 UDP health check

The UDP health check process is as follows:

  1. The load balancer sends an ICMP Echo Request packet and UDP probe packet to the backend server.
  2. If the load balancer receives an ICMP Echo Reply packet and does not receive an ICMP Port Unreachable error within the health check timeout duration, it considers the backend server as healthy. If the load balancer receives an ICMP Port Unreachable error, it considers the backend server as unhealthy.

If there is a large number of concurrent requests, the health check result may be different from the actual health of the backend server.

If the backend server runs Linux, it may limit the rate of ICMP packets as a defense against ping flood attacks. In this case, even if there is a service exception, ELB will not receive the error message "port XX unreachable", and the server will still be determined healthy. This causes the health check result to be different from the actual health of the backend server.

HTTP Health Check

You can also configure HTTP health checks to obtain server statuses through HTTP GET requests if you select TCP, HTTP, or HTTPS as the backend protocol. Figure 3 shows how an HTTP health check works.

Figure 3 HTTP health check

The HTTPS health check process is as follows:

  1. The load balancer sends an HTTP GET request to the backend server (in format of {Private IP address}:{Health check port}/{Health check path}). (You can specify a domain name when configuring a health check.)
  2. The backend server returns an HTTP status code to ELB.
    • If the load balancer receives the status code within the health check timeout duration, it compares the status code with the preset one. If the status codes are the same, the backend server is declared healthy.
    • If the load balancer does not receive any response from the backend server within the health check timeout duration, it declares the backend server is unhealthy.
  • If HTTP health check is selected for the TCP listener of a dedicated load balancer, the load balancer uses HTTP/1.0 to send requests to backend servers. HTTP/1.0 is used to establish short-lived connections. This means the load balancer will not translate the HTTP responses until it receives the TCP disconnection packet. Ensure that the backend server disconnects the TCP connection immediately after sending the responses. Otherwise, the health check may fail.
  • In an HTTP health check, the User-Agent header identifies that the requests are sent for health checks. The value of User-Agent may be adjusted based on service requirements. So, it is not recommended to rely on this header for verification or judgment.

HTTPS Health Check

For TCP, HTTP, and HTTPS backend protocols, you can use HTTPS to establish an SSL connection over TLS handshakes to obtain the statuses of backend servers. Figure 4 shows how an HTTPS health check works.

Figure 4 HTTPS health check

The HTTPS health check process is as follows:

  1. The load balancer sends a Client Hello packet to establish an SSL connection with the backend server.
  2. After receiving the Server Hello packet from the backend server, the load balancer sends an encrypted HTTP GET request to the backend server (in the format of {Private IP address}:{Health check port}/{Health check path}). (You can specify a domain name when configuring a health check.)
  3. The backend server returns an HTTP status code to the load balancer.
    • If the load balancer receives the status code within the health check timeout duration, it compares the status code with the preset one. If the status codes are the same, the backend server is declared healthy.
    • If the load balancer does not receive any response from the backend server within the health check timeout duration, it declares the backend server is unhealthy.

In an HTTPS health check, the User-Agent header identifies that the requests are sent for health checks. The value of User-Agent may be adjusted based on service requirements. So, it is not recommended to rely on this header for verification or judgment.

TLS Health Check

For the TLS, HTTP, and HTTPS backend protocols, you can use TLS to initiate handshakes, and then send Client Hello to a backend server to check whether the server is healthy.

Figure 5 TLS Health Check

The TLS health check process is as follows:

  1. The load balancer sends a TCP SYN packet to the backend server (in the format of {Private IP address}:{Health check port}).
    • If the load balancer does not receive the SYN-ACK packet within the health check timeout duration, the backend server is declared unhealthy.
    • If the load balancer receives an SYN+ACK packet within the timeout duration, it sends a Client Hello packet to the backend server. The TLS versions include TLSv1.0, TLSv1.1, TLSv1.2, and TLSv1.3.
  2. If the load balancer receives the Server Hello packet within the timeout duration, the backend server is declared healthy. If the load balancer does not receive the Server Hello packet within the timeout duration, it declares the backend server is unhealthy.

gRPC Health Check

Figure 6 gRPC health check

The gRPC health check process is as follows:

  1. The load balancer sends an HTTP POST or GET request to the backend server (in format of {Private IP address}:{Health check port}/{Health check path}). (You can specify a domain name when configuring a health check.)
  2. The backend server returns a status code to the load balancer.
  3. The load balancer receives the value of grpc-status in the HTTP/2 header as the returned gRPC status code.
    • If the load balancer receives the status code within the health check timeout duration, it compares the status code with the preset one. If the status codes are the same, the backend server is declared healthy.
    • If the load balancer does not receive any response from the backend server within the health check timeout duration, it declares the backend server is unhealthy.

Health Check Time Window

Health checks greatly improve service availability. However, if health checks are too frequent, service availability will be compromised. To avoid the impact, ELB declares a backend server healthy or unhealthy after several consecutive health checks.

The health check time window is determined by the factors in Table 2.

Table 2 Factors affecting the health check time window

Factor

Description

Check Interval

How often health checks are performed.

Timeout Duration

How long the load balancer waits for the response from the backend server.

Health Check Threshold

The number of consecutive successful or failed health checks required for determining whether the backend server is healthy or unhealthy.

The following is a formula for you to calculate the health check time window:

  • Time window for a backend server to be detected healthy = Timeout duration x Healthy threshold + Interval x (Healthy threshold – 1)
  • Time window for a backend server to be detected unhealthy = Timeout duration x Unhealthy threshold + Interval x (Unhealthy threshold – 1)
As shown in Figure 7, if the health check interval is 4s, the health check timeout duration is 2s, and unhealthy threshold is 3, the time window for a backend server to be considered unhealthy is calculated as follows: 2 x 3 + 4 x (3 – 1) = 14s.
Figure 7 Health check timeout duration

Rectifying an Unhealthy Backend Server

If a backend server is detected unhealthy, see How Do I Troubleshoot an Unhealthy Backend Server?