Updated on 2025-08-19 GMT+08:00

Upgrading and Rolling Back a Workload

After a workload is created, you can upgrade it and roll it back. The flexible upgrade and rollback mechanism enables smooth version transitions without interrupting services. If an issue arises during an upgrade, you can quickly restore to the previous stable state. The upgrade and rollback mechanism applies to multiple workloads:

  • Upgrade: applies to Deployments, StatefulSets, and DaemonSets.
  • Rollback: applies to Deployments.

Upgrading a Workload

CCE offers two upgrade modes for Deployments, StatefulSets, and DaemonSets to meet their release requirements in different service scenarios.

  • Rolling upgrade (RollingUpdate): The default upgrade mode, in which CCE launches new-version pods, verifies their readiness, and then terminates old-version pods. This ensures continuous service availability. This mode is ideal for scenarios where service interruption is unacceptable.
  • Replace upgrade (Recreate): CCE terminates all old-version pods and then creates new-version pods simultaneously. This will interrupt services temporarily. This mode is suitable for stateless applications that can tolerate downtime, such as development and test environments, and backend batch processing tasks.

You can specify an upgrade mode when creating a workload. To do so, perform the following operations.

  1. Log in to the CCE console and click the cluster name to access the cluster console.
  2. In the navigation pane, choose Workloads. In the upper right corner of the displayed page, click Create Workload.
  3. In the Upgrade area under Advanced Settings, select an upgrade mode. For details, see Figure 1 and Table 1. The common parameters for rolling upgrades and replacement upgrades serve similar purposes, though their implementations differ.

    Figure 1 Upgrade Mode

    Table 1 Workload upgrade modes

    Parameter

    Description

    Constraint

    Max. Unavailable Pods (maxUnavailable)

    The maximum number or percentage of pods that can be unavailable during a rolling upgrade. This also sets the limit for how many running pods can be below the expected number. The default value is 25%. During an upgrade, the percentage is converted into an absolute number and rounded down.

    For example, if spec.replicas is set to 2, no pods (2 x 0.25 = 0.5, rounded down to 0) can be unavailable. Therefore, during an upgrade, there will always be at least two pods running (2 desired - 0 unavailable). Each old pod is deleted only after a new one is created, ensuring that at least two pods are always running until all pods are updated.

    This parameter is only available for Deployments and DaemonSets.

    Max. Surge (maxSurge)

    The maximum number or percentage of pods that can exist above the desired number of pods during a rolling upgrade. This parameter determines the maximum number of new pods that can be created at a time to replace old pods. The default value is 25%. During an upgrade, the percentage is converted into an absolute number and rounded up.

    For example, if spec.replicas is set to 2, a maximum of one pod (2 x 0.25 = 0.5, rounded up to 1) can be created at a time by default. Therefore, during an upgrade, up to 3 pods can exist (2 desired + 1 surge).

    This parameter is only available for Deployments and DaemonSets.

    Min. Ready Seconds (minReadySeconds)

    The minimum duration a new pod must remain in the ready state before it is marked as available.

    This ensures a stable observation period, preventing unstable pods from joining the service too early and thereby improving the stability and reliability of the deployment.

    None

    Revision History Limit (revisionHistoryLimit)

    The maximum number of old ReplicaSets retained for rollback. These consume etcd resources and appear in kubectl get rs output.

    Each Deployment's historical configurations are stored in its corresponding ReplicaSet. Deleting an old ReplicaSet makes it impossible to roll back the Deployment to that version. By default, CCE retains the latest 10 old ReplicaSets.

    None

    Max. Upgrade Duration (progressDeadlineSeconds)

    The maximum time, in seconds, that a Deployment can take to make progress before it is considered to be failing. If specifying this parameter, ensure its value is greater than minReadySeconds.

    During a rolling upgrade, if a Deployment does not make progress within the time specified by progressDeadlineSeconds, it is marked with the following conditions: Type=Progressing, Status=False, and Reason=ProgressDeadlineExceeded. The above information indicates that the Deployment upgrade has failed. CCE records the failure event but does not automatically roll back the Deployment to a previous version. The rollback must be manually performed or triggered through an external mechanism.

    This parameter is only available for Deployments.

    Scale-In Time Window (terminationGracePeriodSeconds)

    The maximum time kubelet waits for containers in a pod to exit gracefully when the pod is deleted. The default is 30 seconds. Containers should exit within this period. Otherwise, they will be forcibly terminated with SIGKILL.

    None

  4. Configure other parameters and click Create Workload. The workload status will change to Running later.

The following uses a Deployment as an example to describe how to configure a workload upgrade policy using kubectl.

  1. Use kubectl to access the cluster. For details, see Accessing a Cluster Using kubectl.
  2. Create a file named nginx-deployment.yaml. The name is only an example. You can rename it as needed.

    vi nginx-deployment.yaml
    Below is an example of the file. For details about the Deployment configuration, see the Kubernetes official documentation.
    apiVersion: apps/v1
    kind: Deployment  
    metadata:
      name: nginx   
      namespace: default  
    spec:
      replicas: 2   
      selector:
        matchLabels:   
          app: nginx
      template:  
        metadata:
          labels:  
            app: nginx
        spec:
          containers:
          - image: nginx:latest    
            imagePullPolicy: Always  
            name: nginx  
            resources:  
              requests:  
                cpu: 250m
                memory: 512Mi
              limits:  
                cpu: 250m
                memory: 512Mi
          imagePullSecrets:  
          - name: default-secret  
          terminationGracePeriodSeconds: 30
      strategy:
        type: RollingUpdate # A rolling upgrade. Recreate indicates a replacement upgrade.
        rollingUpdate: # Configure rolling upgrade parameters.
          maxUnavailable: 25%
          maxSurge: 25%
      minReadySeconds: 0
      revisionHistoryLimit: 10
      progressDeadlineSeconds: 600
    Table 2 Workload upgrade modes

    Parameter

    Description

    Constraint

    maxUnavailable

    The maximum number or percentage of pods that can be unavailable during a rolling upgrade. This also sets the limit for how many running pods can be below the expected number. The default value is 25%. During an upgrade, the percentage is converted into an absolute number and rounded down.

    For example, if spec.replicas is set to 2, no pods (2 x 0.25 = 0.5, rounded down to 0) can be unavailable. Therefore, during an upgrade, there will always be at least two pods running (2 desired - 0 unavailable). Each old pod is deleted only after a new one is created, ensuring that at least two pods are always running until all pods are updated.

    This parameter is only available for rolling upgrades.

    maxSurge

    The maximum number or percentage of pods that can exist above the desired number of pods during a rolling upgrade. This parameter determines the maximum number of new pods that can be created at a time to replace old pods. The default value is 25%. During an upgrade, the percentage is converted into an absolute number and rounded up.

    For example, if spec.replicas is set to 2, a maximum of one pod (2 x 0.25 = 0.5, rounded up to 1) can be created at a time by default. Therefore, during an upgrade, up to 3 pods can exist (2 desired + 1 surge).

    This parameter is only available for rolling upgrades.

    minReadySeconds

    The minimum duration a new pod must remain in the ready state before it is marked as available.

    This ensures a stable observation period, preventing unstable pods from joining the service too early and thereby improving the stability and reliability of the deployment.

    None

    revisionHistoryLimit

    The maximum number of old ReplicaSets retained for rollback. These consume etcd resources and appear in kubectl get rs output.

    Each Deployment's historical configurations are stored in its corresponding ReplicaSet. Deleting an old ReplicaSet makes it impossible to roll back the Deployment to that version. By default, CCE retains the latest 10 old ReplicaSets.

    None

    progressDeadlineSeconds

    The maximum time, in seconds, that a Deployment can take to make progress before it is considered to be failing. If specifying this parameter, ensure its value is greater than minReadySeconds.

    During a rolling upgrade, if a Deployment does not make progress within the time specified by progressDeadlineSeconds, it is marked with the following conditions: Type=Progressing, Status=False, and Reason=ProgressDeadlineExceeded. The above information indicates that the Deployment upgrade has failed. CCE records the failure event but does not automatically roll back the Deployment to a previous version. The rollback must be manually performed or triggered through an external mechanism.

    None

    terminationGracePeriodSeconds

    The maximum time kubelet waits for containers in a pod to exit gracefully when the pod is deleted. The default is 30 seconds. Containers should exit within this period. Otherwise, they will be forcibly terminated with SIGKILL.

    None

  3. Create the Deployment.

    kubectl create -f nginx-deployment.yaml

    If information similar to the following is displayed, the Deployment is being created:

    deployment.apps/nginx created

  4. Check the Deployment status.

    kubectl get deployment

    If information similar to the following is displayed, the Deployment has been created:

    NAME           READY     UP-TO-DATE   AVAILABLE   AGE 
    nginx          2/2       2            2           4m5s

  5. Change the image used by the Deployment to nginx:alpine.

    kubectl edit deploy nginx

    Modify the image in spec.containers.image, save the modification, and exit.

  6. Check the ReplicaSets of the Deployment.

    kubectl get rs

    Information similar to the following is displayed:

    NAME               DESIRED   CURRENT   READY     AGE 
    nginx-6f9f58dffd   2         2         2         1m    # New-version ReplicaSet (activated)
    nginx-7f98958cdf   0         0         0         48m   # Old-version ReplicaSet (deactivated)

    During the upgrade, with spec.replicas set to 2 and both maxSurge and maxUnavailable set to 25%, the number of pods changes as follows:

    • maxSurge = 1 (2 x 25% = 0.5, rounded up) -> One extra pod can be added at most.
    • maxUnavailable = 0 (2 x 25% = 0.5, rounded down) -> No unavailable pod is allowed.

    During the upgrade, up to three pods can exist in total, with two always available for services. CCE creates new pods, checks their readiness, and deletes old pods one by one until all are upgraded.

  7. After the upgrade is complete, check the pod statuses:

    kubectl get pods

    In the command output, if all the pods are Running, the Deployment has been upgraded.

    NAME                     READY     STATUS    RESTARTS   AGE
    nginx-6f9f58dffd-tdmqk   1/1       Running   0          1m
    nginx-6f9f58dffd-tesqr   1/1       Running   0          1m

Rolling Back a Workload

If an error occurs during an upgrade, you can roll back the Deployment to its previous version. This is possible because old-version ReplicaSets are retained after each upgrade. A rollback essentially replaces the upgraded ReplicaSet with an old one. You can use revisionHistoryLimit to control the maximum number of retained historical versions (default: 10).

  • Using the console
    1. Log in to the CCE console and click the cluster name to access the cluster console.
    2. In the navigation pane, choose Workloads. In the Operation column of the target workload, choose More > Roll Back.
      Figure 2 Roll Back

  • Using kubectl

    Run the following command to roll back the target Deployment:

    kubectl rollout undo deployment nginx

    If information similar to the following is displayed, the rollback is successful:

    deployment.apps/nginx rolled back

Helpful Links