Updated on 2022-11-14 GMT+08:00

Hitless Upgrade

To achieve a hitless upgrade, the following problems need to be solved:

  1. Service interruption during stopping a service. During the process of stopping a service, the service may be processing requests, and new requests may be continuously sent to the service.
  2. In the microservice architecture, service discovery is usually performed through the service center. The client caches the instance address. Access failure. This is because when the service is stopped, users may not be aware that the instance is offline in a timely manner and continue to use the incorrect instance for access.
  3. Rolling upgrades. The old version can be stopped only after the new version is ready.

Many measures are required to achieve a hitless upgrade, for example, Rolling Upgrade. Therefore, you are advised to ensure that at least two instances are available. Java chassis implements hitless upgrades:

  1. Graceful shutdown: When the service is stopped, the system waits for the request to be completed and rejects the new request.

    Graceful shutdown is provided by default. Before a process exits, certain cleanup actions are performed, including waiting for the requests that are being processed to complete, rejecting new requests that are not in the processing queue, and invoking the registry center API to deregister the process. Before exiting a Java chassis process, change the instance status to DOWN and wait for a period of time.

    servicecomb:
      boot:
        turnDown:
          # Wait time after the instance status is changed to Down. The default value is 0, indicating no waiting.
          waitInSeconds: 30 
  2. Retry: If the client fails to connect to the network or rejects the request, a new server needs to be selected for retry.

    Enable the retry policy.

    servicecomb:
      loadbalance:
        retryEnabled: true # Whether to enable the retry policy.
        retryOnNext: 1  # Number of retry times for searching for an instance (different from a failed instance; depending on the load balancing policy)
        retryOnSame: 0  # Number of retries on the failed instance.
  3. Isolation: Service instances that fail to be processed for a specified number of times are isolated.

    Enable the instance isolation policy.

    servicecomb:
      loadbalance:
        isolation:
          enabled: true
          enableRequestThreshold: 5 # Minimum number of successful and failed requests processed by the instance in a statistical period.
          singleTestTime: 60000 # Time after which the system attempts to access the instance isolated. If the access is successful, isolation will be canceled. Otherwise, isolation will continue.
          continuousFailureThreshold: 2 # Condition for isolating a instance: the instance fails to be isolated for two consecutive times.