Updated on 2023-07-14 GMT+08:00

Configuring Health Check

Health check periodically checks health status during component running according to your needs.

ServiceStage provides the following health check methods:

  • Component Liveness Probe: checks whether an application component exists. It is similar to the ps command that checks whether a process exists. If the liveness check of an application component fails, the cluster restarts the application component. If the liveness check is successful, no operation is executed.
  • Component Service Probe: checks whether an application component is ready to process user requests. It may take a long time for some applications to start before they can provide services. This is because that they need to load disk data or rely on startup of an external module. In this case, the application process exists, but the application cannot provide services. This check method is useful in this scenario. If the application component readiness check fails, the cluster masks all requests sent to the application component. If the application component readiness check is successful, the application component can be accessed.

Health Check Modes

  • HTTP request-based check

    This health check mode is applicable to application components that provide HTTP/HTTPS services. The cluster periodically sends an HTTP/HTTPS GET request to such application components. If the return code of the HTTP/HTTPS response is within 200–399, the check is successful. Otherwise, the check fails. In this health check mode, you must specify an application listening port and an HTTP/HTTPS request path.

    For example, if the application component provides the HTTP service, the port number is 80, the HTTP check path is /health-check, and the host address is containerIP, the cluster periodically initiates the following request to the application:

    GET http://containerIP:80/health-check

    If the host address is not set, the instance IP address is used by default.

  • TCP port-based check

    For applications that provide a TCP communication service, the cluster periodically establishes a TCP connection to the application. If the connection is successful, the probe is successful. Otherwise, the probe fails. In this health check mode, you must specify an application listening port. For example, if you have a Nginx application component with service port 80, after you configure a TCP port-based check for the application component and specify port 80 for the check, the cluster periodically establishes a TCP connection with port 80 of the application component. If the connection is successful, the check is successful. Otherwise, the check fails.

  • CLI-based check

    In this mode, you must specify an executable command in an application component. The cluster will periodically execute the command in the application component. If the command output is 0, the health check is successful. Otherwise, the health check fails.

    The CLI mode can be used to replace the following modes:

    • TCP port-based check: Write a program script to connect to an application component port. If the connection is successful, the script returns 0. Otherwise, the script returns –1.
    • HTTP request-based check: Write a program script to run the wget command for an application component.

      wget http://127.0.0.1:80/health-check

      Check the return code of the response. If the return code is within 200–399, the script returns 0. Otherwise, the script returns –1.

      • Put the program to be executed in the application component image so that the program can be executed.
      • If the command to be executed is a shell script, add a script interpreter instead of specifying the script as the command. For example, if the script is /data/scripts/health_check.sh, you must specify sh/data/scripts/health_check.sh for command execution. The reason is that the cluster is not in the terminal environment when executing programs in an application component.

Common Parameter Description

Table 1 Common parameter description

Parameter

Description

Latency (s)

Check delay time. Unit: second. Set this parameter according to the normal startup time of services.

For example, if this parameter is set to 30, the health check will be started 30 seconds after the application starts. The time is reserved for containerized services to start.

Timeout Period (s)

Timeout duration. Unit: second. If the time exceeds this value, the health check fails.

For example, setting this parameter to 10 indicates that the health check timeout period is 10s. If the parameter is left blank or set to 0, the default timeout time is 1s.

Procedure

  1. Choose Advanced Settings > O&M Monitoring.
  2. Click Health Check, and set health check parameters based on service requirements.

    For details about common parameters, see Table 1.