Updated on 2024-09-20 GMT+08:00

Health Check

ELB periodically sends requests to backend servers to check whether they can process requests. This process is called health check.

If a backend server is detected unhealthy, the load balancer will stop routing requests to it. After the backend server recovers, the load balancer will resume routing requests to it.

If backend servers have to handle large number of requests, frequent health checks may overload the backend servers and cause them to respond slowly. To address this problem, you can prolong the health check interval or use TCP or UDP instead of HTTP. You can also disable health check. If you choose to disable health check, requests may be routed to unhealthy servers, and service interruptions may occur.

Health Check Protocol

You can configure health checks when configuring backend server groups. Generally, you can use the default setting or select a different health check protocol as you need.

If you want to modify health check settings, see details in Enabling or Disabling Health Check.

Select a health check protocol that matches the backend protocol as described in Table 1.

Table 1 The backend protocol and health check protocols (shared load balancers)

Backend Protocol

Health Check Protocol

TCP

TCP or HTTP

UDP

UDP

HTTP

TCP or HTTP

HTTPS

TCP or HTTP

Health Check Source IP Address

A shared load balancer uses an IP address in 100.125.0.0/16 to send requests to backend servers and verify their health status. To perform health checks, ensure that the security group rules of the backend server allow access from 100.125.0.0/16. For details, see Security Group and Network ACL Rules.

TCP Health Check

For TCP, HTTP, and HTTPS backend protocols, you can use TCP to initiate three-way handshakes to obtain the statuses of backend servers.

Figure 1 TCP health check

The TCP health check process is as follows:

  1. The load balancer sends a TCP SYN packet to the backend server (in the format of {Private IP address}:{Health check port}).
  2. The backend server returns an SYN-ACK packet.
    • If the load balancer does not receive the SYN-ACK packet within the timeout duration, it declares that the backend server is unhealthy and sends an RST packet to the backend server to terminate the TCP connection.
    • If the load balancer receives the SYN-ACK packet from the backend server within the timeout duration, it sends an ACK packet to the backend server and declares that the backend server is healthy. After that, the load balancer sends an RST packet to the backend server to terminate the TCP connection.

After a successful TCP three-way handshake, an RST packet will be sent to close the TCP connection. The application on the backend server may consider this packet a connection error and reply with a message, for example, "Connection reset by peer". To avoid this issue, take either of the following actions:

UDP Health Check

For UDP backend protocol, ELB sends ICMP and UDP probe packets to backend servers to check their health.

Figure 2 UDP health check

The UDP health check process is as follows:

  1. The load balancer sends an ICMP Echo Request packet to the backend server.
    • If the load balancer does not receive an ICMP Echo Reply packet within the health check timeout duration, the backend server is declared unhealthy.
    • If the load balancer receives an ICMP Echo Reply packet within the timeout period, it sends a UDP probe packet to the backend server.
  2. If the load balancer does not receive an ICMP Port Unreachable error within the health check timeout duration, it declares the backend server is healthy. If the load balancer receives an ICMP Port Unreachable error, the backend server is declared unhealthy.

HTTP Health Check

You can also configure HTTP health checks to obtain server statuses through HTTP GET requests if you select TCP, HTTP, or HTTPS as the backend protocol. Figure 3 shows how an HTTP health check works.

Figure 3 HTTP health check

The HTTPS health check process is as follows:

  1. The load balancer sends an HTTP GET request to the backend server (in format of {Private IP address}:{Health check port}/{Health check path}). (You can specify a domain name when configuring a health check.)
  2. The backend server returns an HTTP status code to ELB.
    • If the load balancer receives the status code within the health check timeout duration, it compares the status code with the preset one. If the status codes are the same, the backend server is declared healthy.
    • If the load balancer does not receive any response from the backend server within the health check timeout duration, it declares the backend server is unhealthy.

In an HTTP health check, the User-Agent header identifies that the requests are sent for health checks. The value of User-Agent may be adjusted based on service requirements. So it is not recommended to rely on this header for verification or judgment.

Health Check Time Window

Health checks greatly improve service availability. However, if health checks are too frequent, service availability will be compromised. To avoid the impact, ELB declares a backend server healthy or unhealthy after several consecutive health checks.

The health check time window is determined by the factors in Table 2.

Table 2 Factors affecting the health check time window

Factor

Description

Check Interval

How often health checks are performed.

Timeout Duration

How long the load balancer waits for the response from the backend server.

Health Check Threshold

The number of consecutive successful or failed health checks required for determining whether the backend server is healthy or unhealthy.

The following is a formula for you to calculate the health check time window:

  • Time window for a backend server to be detected healthy = Timeout duration x Healthy threshold + Interval x (Healthy threshold – 1)
  • Time window for a backend server to be detected unhealthy = Timeout duration x Unhealthy threshold + Interval x (Unhealthy threshold – 1)
As shown in Figure 4, if the health check interval is 4s, the health check timeout duration is 2s, and unhealthy threshold is 3, the time window for a backend server to be considered unhealthy is calculated as follows: 2 x 3 + 4 x (3 – 1) = 14s.
Figure 4 Health check timeout duration

Rectifying an Unhealthy Backend Server

If a backend server is detected unhealthy, see How Do I Troubleshoot an Unhealthy Backend Server?