Help Center/ Cloud Container Engine/ Best Practices/ Container/ Upgrading Pods Without Interrupting Services

Updated on 2024-05-31 GMT+08:00

View PDF

Upgrading Pods Without Interrupting Services

Application Scenarios

In a Kubernetes cluster, applications can be accessed externally through Deployments and LoadBalancer Services. When an application is updated or upgraded, new pods are created in the Deployment. These new pods will gradually replace the old ones. During this process, services may be interrupted.

Solution

To prevent an application upgrade from interrupting services, configure Deployments and Services as follows:

In a Deployment, upgrade pods in the Rolling upgrade mode. In this mode, pods are updated one by one, not all at once. In this way, you can control the update speed and the number of concurrent pods to ensure that services are not interrupted during the upgrade. For example, you can configure the maxSurge and maxUnavailable parameters to control the number of new pods created and the number of old pods deleted concurrently. Ensure that there is always a workload that can provide services during the upgrade.
There are two types of service affinity in a LoadBalancer:
- Cluster-level service affinity (externalTrafficPolicy: Cluster). In this mode, if there is no pod deployed on a node, the request is forwarded to pods on another node. During the cross-node forwarding, the source IP address may be lost.
- Node-level service affinity (externalTrafficPolicy: Local). In this mode, requests are directly forwarded to the node where the pod resides. Cross-node forwarding is not involved. Therefore, the source IP address can be preserved. However, if the node where the pod resides changes during the rolling upgrade, the ELB backend server will change accordingly, which may cause service interruption. In this case, you can upgrade pods in place. This ensures that there is at least one pod running properly on the ELB backend node.

The following table lists the solution for ensuring service continuity during a pod upgrade.

Scenario	Service	Deployment
The source IP address does not need to be preserved.	Select the Cluster-level service affinity.	Select Rolling upgrade for Upgrade Mode, configure a graceful termination, and enable Liveness probe and Ready probe.
The source IP address needs to be preserved.	Select the Node-level service affinity.	Select Rolling upgrade for Upgrade Mode, configure a graceful termination, enable Liveness probe and Ready probe, and add Node Affinity policies. (Ensure that there is at least one pod running on each node during the update.)

Procedure

In this example, there are 200 replicas in the workload, and the workload is exposed through the LoadBalance Service. The rolling upgrade of workloads associated with Loadbalance or Ingress Services involves multiple Services. Therefore, you need to pay attention to the configuration of rolling upgrade parameters.

Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Workloads.
In the workload list, click Upgrade in the Operation column of the workload to be upgraded. The Upgrade Workload page is displayed.
1. Enable the liveness probe and ready probe. In the Container Settings area, click Health Check and enable Liveness probe and Ready probe. In this example, TCP is selected for Check Method. Configure the parameters based on your requirements. Parameters like Period (s), Delay (s), and Timeout (s) must be properly configured. Some applications take a long time to start. A small value of these parameters will lead to repeated restart.
  In this example, the ready probe delay is set to 20 to control the interval for rolling workloads in batches.
  
  Figure 1 Enabling the liveness probe and ready probe
2. Configure a rolling upgrade. In the Advanced Settings area, click Upgrade and select Rolling upgrade for Upgrade Mode. This ensures that the instances of the old versions are gradually replaced with the ones of the new versions.
  In this example, maxUnavailable is set to 2%, and maxSurge is set to 2% to control the workload rolling step. This, works with the ready probe delay, enables eight workloads to be upgraded every 20 seconds.
  
  Figure 2 Configuring a rolling upgrade
3. Configure a graceful termination.
  1. In the Container Settings area, click Lifecycle and configure pre-stop processing. Configure this parameter to the time required for the Service to process all remaining requests, most of which are persistent connection requests. You can, for example, set the workload to hibernate for 30s after receiving a deletion request so that the workload can have sufficient time to process the remaining requests to ensure proper service running.
  2. In the Advanced Settings area, click Upgrade. Configure Scale-In Time Window (terminationGracePeriodSeconds) to specify the waiting time for command execution before the container is stopped. The scale-in time window must be greater than the pre-stop processing time. Add 30s to the command execution time before the container is stopped. If, for example, the pre-stop processing time is 30s, the scale-in time window should be 60s.
  Figure 3 Entering the pre-stop command
4. Add node affinity policies. Add this kind of policy when Node-level is selected for a Service's Service Affinity. In the Advanced Settings area, click Scheduling and add Node Affinity policies. When adding a scheduling policy, specify the nodes that the workload requires affinity.
  Figure 4 Adding node affinity policies
After the configuration is complete, click Upgrade Workload.

On the Pods tab, after a newly created pod is displayed, stop the old one. This ensures that there is always a pod running in the workload.