Before You Start_Cloud Container Engine

Precautions

Before upgrading a cluster, pay attention to the following points:

Perform an upgrade during off-peak hours to minimize the impact on your services.
Before upgrading a cluster, learn about the features and differences of each cluster version in Kubernetes Release Notes to prevent exceptions due to the use of an incompatible cluster version. For example, check whether any APIs deprecated in the target version are used in the cluster. Otherwise, calling the APIs may fail after the upgrade. For details, see Deprecated APIs.

During a cluster upgrade, pay attention to the following points that may affect your services:

During a cluster upgrade, do not perform any operation on the cluster. If you stop, restart, or delete nodes while upgrading the cluster, the upgrade will fail.
Before upgrading a cluster, ensure no high-risk operations are performed in the cluster. Otherwise, the cluster upgrade may fail or the configuration may be lost after the upgrade. Common high-risk operations include modifying cluster node configurations locally and modifying the configurations of the listeners managed by CCE on the ELB console. Instead, modify configurations on the CCE console so that the modifications can be automatically inherited during the upgrade.
During a cluster upgrade, the running workloads will not be interrupted, but access to the API server will be temporarily interrupted.
By default, application scheduling is not restricted during a cluster upgrade. During an upgrade of the following early cluster versions, the node.kubernetes.io/upgrade taint (equivalent to NoSchedule) will be added to the nodes in the cluster and removed after the cluster is upgraded:
- All v1.15 clusters
- All v1.17 clusters
- v1.19 clusters with patch versions earlier than or equal to v1.19.16-r4
- v1.21 clusters with patch versions earlier than or equal to v1.21.7-r0
- v1.23 clusters with patch versions earlier than or equal to v1.23.5-r0
During a cluster upgrade, if an add-on is also upgraded and cluster resources are limited, add-on pods can use resources that would otherwise be allocated to service pods. This may result in the eviction of service pods. After the add-on is upgraded, the evicted service pods will automatically recover.

Notes and Constraints

If an error occurred during a cluster upgrade, the cluster can be rolled back using the backup data. If you perform other operations (for example, modifying cluster specifications) after a successful cluster upgrade, the cluster cannot be rolled back using the backup data.
If your cluster is upgraded to v1.21.10-r0, v1.23.8-r0, v1.25.3-r0, or a later version, the following add-ons must be upgraded to the target version if these add-ons are installed in the cluster (for details, see Data Disk Shared Between a Container Engine and kubelet Components):
- NPD: v1.18.10 or later
- log-agent: v1.3.0 or later
When clusters using the tunnel network model are upgraded to v1.19.16-r4, v1.21.7-r0, v1.23.5-r0, v1.25.1-r0, or later, the SNAT rule whose destination address is the container CIDR block but the source address is not the container CIDR block will be removed. If you have configured VPC routes to directly access all pods outside the cluster, only the pods on the corresponding nodes can be directly accessed after the upgrade.

The new add-on versions (see Change History) of the NGINX Ingress Controller allow graceful exit and the grace period for deleting the ELB backend controller and support hitless upgrades. During the upgrade of the add-on versions listed in the following table, services may be unavailable for a short period of time. If the following add-on versions have been installed in the cluster, perform the upgrade during off-peak hours.

Add-on Version	Version Range
2.1.x	x < 32
2.2.x	x < 41
2.4.x	x < 4

Upgrading a cluster will restart NetworkManager. This will trigger the DHCP client to renew IP address leasing. By default, /etc/resolv.conf is updated based on the subnet DNS configuration. Modify the DNS DNS configuration on the VPC console. For details, see How Do I Change the DNS Server Address of an ECS?
For more details, see Version Differences.

Version Differences

Upgrade Path	Version Difference	Self-Check
v1.23 or v1.25 Upgraded to v1.27	Docker is no longer recommended. Use containerd instead. For details, see Container Engines.	This item has been included in the pre-upgrade check.
v1.21 or v1.19 Upgraded to v1.23	For the NGINX Ingress Controller of an earlier version (community version v0.49 or earlier, or CCE nginx-ingress version v1.x.x), the created ingresses can be managed by the NGINX Ingress Controller even if kubernetes.io/ingress.class: nginx is not set in the ingress annotations. However, for the NGINX Ingress Controller of a later version (community version v1.0.0 or later, or CCE nginx-ingress version v2.x.x), the ingresses created without specifying the Nginx type will not be managed by the NGINX Ingress Controller, and ingress rules will become invalid, which interrupts services.	This item has been included in the pre-upgrade check. You can also perform the self-check by referring to NGINX Ingress Controller.
v1.19 to v1.21	The bug of exec probe timeouts is fixed in Kubernetes 1.21. Before this bug is fixed, the exec probe does not consider the timeoutSeconds field. Instead, the probe will run indefinitely, even beyond its configured deadline. It will stop until the result is returned. If this field is not specified, the default value 1 is used. This field takes effect after the upgrade. If the probe runs over 1 second, the application health check may fail and the application may restart frequently.	Before the upgrade, check whether the timeout is properly set for the exec probe.
v1.19 to v1.21	kube-apiserver of CCE v1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly. Root cause: X.509 CommonName is discarded in Go v1.15. kube-apiserver of CCE v1.19 is compiled using Go v1.15. If your webhook certificate does not have SANs, kube-apiserver does not process the CommonName field of the X.509 certificate as the host name by default. As a result, the authentication fails.	Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server. If you do not have your own webhook server, you can skip this check. If the field is not set, use the SAN field to specify the IP address and domain name supported by the certificate.
v1.15 to v1.19	The control plane of CCE clusters of v1.19 is incompatible with kubelet v1.15. If a node fails to be upgraded or the node to be upgraded restarts after the master node is successfully upgraded, there is a high probability that the node is in the NotReady status. This is because the node failed to be upgraded restarts the kubelet and trigger the node registration. In clusters of v1.15, the default registration tags (failure-domain.beta.kubernetes.io/is-baremetal and kubernetes.io/availablezone) are regarded as invalid tags by the clusters of v1.19. The valid tags in the clusters of v1.19 are node.kubernetes.io/baremetal and failure-domain.beta.kubernetes.io/zone.	In normal cases, this scenario is not triggered. After the master node is upgraded, do not suspend the upgrade so the node can be quickly upgraded. If a node fails to be upgraded and cannot be restored, evict applications on the node as soon as possible. Contact technical support and skip the node upgrade. After the upgrade is complete, reset the node.
	In CCE v1.15 and v1.19 clusters, the Docker storage driver file system has been changed from XFS to Ext4. As a result, the import package sequence may be abnormal in the pods of upgraded Java application, leading to pod exceptions.	Before the upgrade, check the Docker configuration file /etc/docker/daemon.json on the node. Check whether the value of dm.fs is xfs. If the value is ext4 or the storage driver is Overlay, you can skip the next steps. If the value is xfs, you are advised to deploy applications in the cluster of the new version in advance to test whether the applications are compatible with the new cluster version. { "storage-driver": "devicemapper", "storage-opts": [ "dm.thinpooldev=/dev/mapper/vgpaas-thinpool", "dm.use_deferred_removal=true", "dm.fs=xfs", "dm.use_deferred_deletion=true" ] }
	kube-apiserver of CCE v1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly. Root cause: X.509 CommonName is discarded in Go v1.15. kube-apiserver of CCE v1.19 is compiled using Go v1.15. The CommonName field is processed as the host name. As a result, the authentication fails.	Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server. If you do not have your own webhook server, you can skip this check. If the field is not set, use the SAN field to specify the IP address and domain name supported by the certificate. NOTICE: To mitigate the impact of version differences on cluster upgrade, CCE performs special processing during the upgrade from v1.15 to v1.19 and still supports certificates without SANs. However, no special processing is required for subsequent upgrades. You are advised to rectify your certificate as soon as possible.
	In clusters of v1.17.17 and later, CCE automatically creates pod security policies (PSPs) for you, which restrict the creation of pods with unsafe configurations, for example, pods for which net.core.somaxconn under a sysctl is configured in the security context.	After an upgrade, you can allow insecure system configurations as required. For details, see Configuring a Pod Security Policy.
	If initContainer or Istio is used in the in-place upgrade of a cluster of v1.15, pay attention to the following restrictions: In kubelet v1.16 and later versions, QoS classes are different from those in earlier versions. In kubelet v1.15 and earlier versions, only containers in spec.containers are counted. In kubelet v1.16 and later versions, containers in both spec.containers and spec.initContainers are counted. The QoS class of a pod will change after the upgrade. As a result, the container in the pod restarts.	You are advised to modify the QoS class of the service container before the upgrade to avoid this problem. For details, see Table 1.
v1.13 to v1.15	After a VPC network cluster is upgraded, the master node occupies an extra CIDR block due to the upgrade of network components. If no container CIDR block is available for the new node, the pod scheduled to the node cannot run.	Generally, this problem occurs when the nodes in the cluster are about to fully occupy the container CIDR block. For example, the container CIDR block is 10.0.0.0/16, the number of available IP addresses is 65,536, and the VPC network allocates a CIDR block with the fixed size (using the mask to determine the maximum number of container IP addresses allocated to each node). If the upper limit is 128, the cluster supports a maximum of 512 (65536/128) nodes, including the three master nodes. After the cluster is upgraded, each of the three master nodes occupies one CIDR block. As a result, 506 nodes are supported.

**Table 1** QoS class changes before and after the upgrade
Init Container (Calculated Based on spec.initContainers)	Service Container (Calculated Based on spec.containers)	Pod (Calculated Based on spec.containers and spec.initContainers)	Impacted or Not
Guaranteed	Besteffort	Burstable	Yes
Guaranteed	Burstable	Burstable	No
Guaranteed	Guaranteed	Guaranteed	No
Besteffort	Besteffort	Besteffort	No
Besteffort	Burstable	Burstable	No
Besteffort	Guaranteed	Burstable	Yes
Burstable	Besteffort	Burstable	Yes
Burstable	Burstable	Burstable	No
Burstable	Guaranteed	Burstable	Yes

Deprecated APIs

With the evolution of Kubernetes APIs, APIs are periodically reorganized or upgraded, and certain APIs are deprecated and finally deleted. The following tables list the deprecated APIs in each Kubernetes community version. For details about more deprecated APIs, see Deprecated API Migration Guide.

APIs Deprecated in Kubernetes v1.29
No APIs deprecated in Kubernetes v1.28
APIs Deprecated in Kubernetes v1.27
APIs Deprecated in Kubernetes v1.25
APIs Deprecated in Kubernetes v1.22
APIs Deprecated in Kubernetes v1.16

When an API is deprecated, the existing resources are not affected. However, when you create or edit the resources, the API version will be intercepted.

**Table 2** APIs deprecated in Kubernetes v1.29
Resource	Deprecated API Version	Substitute API Version	Change Description
FlowSchema and PriorityLevelConfiguration	flowcontrol.apiserver.k8s.io/v1beta2	flowcontrol.apiserver.k8s.io/v1 (This API has been available since v1.29.) flowcontrol.apiserver.k8s.io/v1beta3 (This API has been available since v1.26.)	Significant changes in flowcontrol.apiserver.k8s.io/v1: spec.limited.assuredConcurrencyShares of PriorityLevelConfiguration has been renamed spec.limited.nominalConcurrencyShares. The default value is 30 only when it is not specified, and the explicit value 0 does not change to 30. Key changes in flowcontrol.apiserver.k8s.io/v1beta3: spec.limited.assuredConcurrencyShares of PriorityLevelConfiguration has been renamed spec.limited.nominalConcurrencyShares.

**Table 3** APIs deprecated in Kubernetes v1.27
Resource	Deprecated API Version	Substitute API Version	Change Description
CSIStorageCapacity	storage.k8s.io/v1beta1	storage.k8s.io/v1 (This API has been available since v1.24.)	None
FlowSchema and PriorityLevelConfiguration	flowcontrol.apiserver.k8s.io/v1beta1	flowcontrol.apiserver.k8s.io/v1beta3 (This API has been available since v1.26.)	None
HorizontalPodAutoscaler	autoscaling/v2beta2	autoscaling/v2 (This API has been available since v1.23.)	None

**Table 4** APIs deprecated in Kubernetes v1.25
Resource	Deprecated API Version	Substitute API Version	Change Description
CronJob	batch/v1beta1	batch/v1 (This API has been available since v1.21.)	None
EndpointSlice	discovery.k8s.io/v1beta1	discovery.k8s.io/v1 (This API has been available since v1.21.)	Pay attention to the following changes: In each endpoint, the topology["kubernetes.io/hostname"] field has been deprecated. Replace it with the nodeName field. In each endpoint, the topology["kubernetes.io/zone"] field has been deprecated. Replace it with the zone field. The topology field is replaced with deprecatedTopology and cannot be written in v1.
Event	events.k8s.io/v1beta1	events.k8s.io/v1 (This API has been available since v1.19.)	Pay attention to the following changes: The type field can only be set to Normal or Warning. The involvedObject field is renamed regarding. The action, reason, reportingController, and reportingInstance fields are mandatory for creating a new events.k8s.io/v1 event. Use eventTime instead of the deprecated firstTimestamp field (this field has been renamed deprecatedFirstTimestamp and is not allowed to appear in the new events.k8s.io/v1 event object). Use series.lastObservedTime instead of the deprecated lastTimestamp field (this field has been renamed deprecatedLastTimestamp and is not allowed to appear in the new events.k8s.io/v1 event object). Use series.count instead of the deprecated count field (this field has been renamed deprecatedCount and is not allowed to appear in the new events.k8s.io/v1 event object). Use reportingController instead of the deprecated source.component field (this field has been renamed deprecatedSource.component and is not allowed to appear in the new events.k8s.io/v1 event object). Use reportingInstance instead of the deprecated source.host field (this field has been renamed deprecatedSource.host and is not allowed to appear in the new events.k8s.io/v1 event object).
HorizontalPodAutoscaler	autoscaling/v2beta1	autoscaling/v2 (This API has been available since v1.23.)	None
PodDisruptionBudget	policy/v1beta1	policy/v1 (This API has been available since v1.21.)	If spec.selector is set to null ({}) in PodDisruptionBudget of policy/v1, all pods in the namespace are selected. (In policy/v1beta1, an empty spec.selector means that no pod will be selected.) If spec.selector is not specified, pod will be selected in neither API version.
PodSecurityPolicy	policy/v1beta1	None	Since v1.25, the PodSecurityPolicy resource no longer provides APIs of the policy/v1beta1 version, and the PodSecurityPolicy access controller is deleted. Use Pod Security Admission instead.
RuntimeClass	node.k8s.io/v1beta1	node.k8s.io/v1 (This API has been available since v1.20.)	None

**Table 5** APIs deprecated in Kubernetes v1.22
Resource	Deprecated API Version	Substitute API Version	Change Description
MutatingWebhookConfiguration ValidatingWebhookConfiguration	admissionregistration.k8s.io/v1beta1	admissionregistration.k8s.io/v1 (This API has been available since v1.16.)	The default value of *webhooks[].failurePolicy is changed from Ignore to Fail in v1. The default value of webhooks[].matchPolicy* is changed from Exact to Equivalent in v1. The default value of *webhooks[].timeoutSeconds is changed from 30s to 10s in v1. The default value of webhooks[].sideEffects* is deleted, and this field must be specified. In v1, the value can only be None or NoneOnDryRun. The default value of *webhooks[].admissionReviewVersions is deleted. In v1, this field must be specified. (AdmissionReview v1 and v1beta1 are supported.) webhooks[].name* must be unique in the list of objects created through admissionregistration.k8s.io/v1.
CustomResourceDefinition	apiextensions.k8s.io/v1beta1	apiextensions/v1 (This API has been available since v1.16.)	The default value of spec.scope is no longer Namespaced. This field must be explicitly specified. spec.version is deleted from v1. Use spec.versions instead. spec.validation is deleted from v1. Use *spec.versions[].schema instead. spec.subresources is deleted from v1. Use spec.versions[].subresources* instead. spec.additionalPrinterColumns is deleted from v1. Use *spec.versions[].additionalPrinterColumns instead. spec.conversion.webhookClientConfig is moved to spec.conversion.webhook.clientConfig in v1. spec.conversion.conversionReviewVersions is moved to spec.conversion.webhook.conversionReviewVersions in v1. spec.versions[].schema.openAPIV3Schema* becomes a mandatory field when the CustomResourceDefinition object of the v1 version is created, and its value must be a structural schema. spec.preserveUnknownFields: true cannot be specified when the CustomResourceDefinition object of the v1 version is created. This configuration must be specified using x-kubernetes-preserve-unknown-fields: true in the schema definition. In v1, the JSONPath field in the additionalPrinterColumns entry is renamed jsonPath (patch #66531).
APIService	apiregistration/v1beta1	apiregistration.k8s.io/v1 (This API has been available since v1.10.)	None
TokenReview	authentication.k8s.io/v1beta1	authentication.k8s.io/v1 (This API has been available since v1.6.)	None
LocalSubjectAccessReview SelfSubjectAccessReview SubjectAccessReview SelfSubjectRulesReview	authorization.k8s.io/v1beta1	authorization.k8s.io/v1 (This API has been available since v1.16.)	spec.group was renamed spec.groups in v1 (patch #32709).
CertificateSigningRequest	certificates.k8s.io/v1beta1	certificates.k8s.io/v1 (This API has been available since v1.19.)	Pay attention to the following changes in certificates.k8s.io/v1: For an API client that requests a certificate: spec.signerName becomes a mandatory field (see Known Kubernetes Signers). In addition, the certificates.k8s.io/v1 API cannot be used to create requests whose signer is kubernetes.io/legacy-unknown. spec.usages now becomes a mandatory field, which cannot contain duplicate string values and can contain only known usage strings. For an API client that needs to approve or sign a certificate: status.conditions cannot contain duplicate types. The *status.conditions[].status field is now mandatory. The status.certificate must be PEM-encoded and can contain only the CERTIFICATE** data block.
Lease	coordination.k8s.io/v1beta1	coordination.k8s.io/v1 (This API has been available since v1.14.)	None
Ingress	networking.k8s.io/v1beta1 extensions/v1beta1	networking.k8s.io/v1 (This API has been available since v1.19.)	The spec.backend field is renamed spec.defaultBackend. The serviceName field of the backend is renamed service.name. The backend servicePort field represented by a number is renamed service.port.number. The backend servicePort field represented by a string is renamed service.port.name. The pathType field is mandatory for all paths to be specified. The options are Prefix, Exact, and ImplementationSpecific. To match the behavior of not defining the path type in v1beta1, use ImplementationSpecific.
IngressClass	networking.k8s.io/v1beta1	networking.k8s.io/v1 (This API has been available since v1.19.)	None
ClusterRole ClusterRoleBinding Role RoleBinding	rbac.authorization.k8s.io/v1beta1	rbac.authorization.k8s.io/v1 (This API has been available since v1.8.)	None
PriorityClass	scheduling.k8s.io/v1beta1	scheduling.k8s.io/v1 (This API has been available since v1.14.)	None
CSIDriver CSINode StorageClass VolumeAttachment	storage.k8s.io/v1beta1	storage.k8s.io/v1	CSIDriver is available in storage.k8s.io/v1 since v1.19. CSINode is available in storage.k8s.io/v1 since v1.17. StorageClass is available in storage.k8s.io/v1 since v1.6. VolumeAttachment is available in storage.k8s.io/v1 since v1.13.

**Table 6** APIs deprecated in Kubernetes v1.16
Resource	Deprecated API Version	Substitute API Version	Change Description
NetworkPolicy	extensions/v1beta1	networking.k8s.io/v1 (This API has been available since v1.8.)	None
DaemonSet	extensions/v1beta1 apps/v1beta2	apps/v1 (This API has been available since v1.9.)	The spec.templateGeneration field is deleted. spec.selector is now a mandatory field and cannot be changed after the object is created. The label of an existing template can be used as a selector for seamless migration. The default value of spec.updateStrategy.type is changed to RollingUpdate (the default value in the extensions/v1beta1 API version is OnDelete).
Deployment	extensions/v1beta1 apps/v1beta1 apps/v1beta2	apps/v1 (This API has been available since v1.9.)	The spec.rollbackTo field is deleted. spec.selector is now a mandatory field and cannot be changed after the Deployment is created. The label of an existing template can be used as a selector for seamless migration. The default value of spec.progressDeadlineSeconds is changed to 600 seconds (the default value in extensions/v1beta1 is unlimited). The default value of spec.revisionHistoryLimit is changed to 10. (In the apps/v1beta1 API version, the default value of this field is 2. In the extensions/v1beta1 API version, all historical records are retained by default.) The default values of maxSurge and maxUnavailable are changed to 25%. (In the extensions/v1beta1 API version, these fields default to 1.)
StatefulSet	apps/v1beta1 apps/v1beta2	apps/v1 (This API has been available since v1.9.)	spec.selector is now a mandatory field and cannot be changed after the StatefulSet is created. The label of an existing template can be used as a selector for seamless migration. The default value of spec.updateStrategy.type is changed to RollingUpdate (the default value in the apps/v1beta1 API version is OnDelete).
ReplicaSet	extensions/v1beta1 apps/v1beta1 apps/v1beta2	apps/v1 (This API has been available since v1.9.)	spec.selector is now a mandatory field and cannot be changed after the object is created. The label of an existing template can be used as a selector for seamless migration.
PodSecurityPolicy	extensions/v1beta1	policy/v1beta1 (This API has been available since v1.10.)	PodSecurityPolicy for the policy/v1beta1 API version will be removed in v1.25.

Upgrade Backup

The following table lists how to back up cluster data.

Backup Type	Backup Object	Backup Method	Backup Duration	Rollback Duration	Description
etcd data backup	etcd data	Automatic backup during an upgrade	1-5 minutes	2 hours	Mandatory. The data is automatically backed up during an upgrade.
CBR cloud server backup	Master node disks, including component images, configurations, logs, and etcd data	One-click backup on a web page (manually triggered)	20 minutes to 2 hours (based on the cloud backup tasks in the current region)	20 minutes	This function is gradually replaced by EVS snapshot backup.
EVS snapshot backup	Master node disks, including component images, configurations, logs, and etcd data	One-click backup on a web page (manually triggered)	1-5 minutes	20 minutes	This function is coming soon. After this function is released, it will replace CBR cloud server backup.

Backup Type

Backup Object

Backup Method

Backup Duration

Rollback Duration

Description

etcd data backup

etcd data

Automatic backup during an upgrade

1-5 minutes

2 hours

Mandatory. The data is automatically backed up during an upgrade.

CBR cloud server backup

Master node disks, including component images, configurations, logs, and etcd data

One-click backup on a web page (manually triggered)

20 minutes to 2 hours (based on the cloud backup tasks in the current region)

20 minutes

This function is gradually replaced by EVS snapshot backup.

EVS snapshot backup

Master node disks, including component images, configurations, logs, and etcd data

One-click backup on a web page (manually triggered)

1-5 minutes

20 minutes

This function is coming soon.

After this function is released, it will replace CBR cloud server backup.

Before You Start