Pre-upgrade Check

The system automatically checks a cluster before its upgrade. If the cluster does not meet the pre-upgrade check conditions, the upgrade cannot continue. To avoid risks, you can perform pre-upgrade check according to the check items and solutions described in this section.

**Table 1** Check items
No.	Check Item	Description
1	Node Restrictions	Check whether the node is available. Check whether the node OS supports the upgrade. Check whether the node is marked with unexpected node pool labels. Check whether the Kubernetes node name is the same as the ECS name.
2	Upgrade Management	Check whether the target cluster is under upgrade management.
3	Add-ons	Check whether the add-on status is normal. Check whether the add-on supports the target version.
4	Helm Charts	Check whether the current HelmRelease record contains discarded Kubernetes APIs that are not supported by the target cluster version. If yes, the Helm chart may be unavailable after the upgrade.
5	Node Pools	Check the node pool status.
6	Security Groups	Check whether the Protocol & Port of the worker node security groups is set to ICMP: All and whether the security group rule with the source IP address set to the master node security group has been deleted.
7	Residual Nodes	Check whether nodes need to be migrated.
8	Discarded Kubernetes Resources	Check whether there are discarded resources in the clusters.
9	Compatibility Risks	Read the version compatibility differences and ensure that they are not affected. The patch upgrade does not involve version compatibility differences.
10	CCE Agent Versions	Check whether cce-agent on the current node is of the latest version.
11	Node CPU Usage	Check whether the node's CPU usage is above 90%.
12	Node Disks	Check whether the key data disks on the node meet the upgrade requirements. Check whether the /tmp directory has 500 MB of available space.
13	Node DNS	Check whether the DNS configuration of the current node can resolve the OBS address. Check whether the current node can access the OBS address of the storage upgrade component package.
14	Node Key Directory File Permissions	Check whether the root directory permissions are properly assigned.
15	kubelet	Check whether the kubelet on the node is running properly.
16	Node Memory	Check whether the node's memory usage is above 90%.
17	Node Clock Synchronization Server	Check whether the clock synchronization server ntpd or chronyd of the node is running properly.
18	Node OS	Check whether the OS kernel version of the node is supported by CCE.
19	Node CPU Cores	Verify that the master nodes in your cluster have more than 2 CPU cores.
20	ASM Version	Check whether ASM is used by the cluster. Check whether the current ASM version supports the target cluster version.
21	Node Readiness	Check whether the nodes in the cluster are ready.
22	Node journald	Check whether journald of a node is normal.
23	containerd.sock	Check whether the containerd.sock file is on the node. This file affects the startup of container runtime in EulerOS.
24	Internal Error	This check item is not typical and implies that an internal error was found during the pre-upgrade check.
25	Node Mount Points	Check whether there are inaccessible mount points on the node.
26	Kubernetes Node Taint	Check whether the taint needed for cluster upgrade exists on the node.
27	Everest Restrictions	Check whether there are any compatibility restrictions on the current Everest add-on.
28	cce-hpa-controller Limitations	Check whether there are compatibility limitations between the current and target cce-controller-hpa add-on versions.
29	Enhanced CPU Policies	Check whether the current cluster version and the target version support enhanced CPU policy.
30	Health of Worker Node Components	Check whether the container runtime and network components on the worker nodes are healthy.
31	Health of Master Node Components	Check whether cluster components such as Kubernetes components, container runtime component, and network component are running properly before the upgrade.
32	Memory Resource Limit of Kubernetes Components	Check whether the resources of Kubernetes components, such as etcd and kube-controller-manager, exceed the limits.
33	Discarded Kubernetes APIs	The system scans the audit logs of the past day to check whether the user calls the deprecated APIs of the target Kubernetes version. NOTE: Due to the limited time range of audit logs, this check item is only an auxiliary method. APIs to be deprecated may have been used in the cluster, but their usage is not included in the audit logs of the past day. Check the API usage carefully.
34	NetworkManager	Check whether NetworkManager of a node is normal.
35	Node ID File	Check the ID file format.
36	Node Configuration Consistency	When you upgrade a cluster to v1.19 or later, CCE checks whether the following configuration files have been modified on the backend:
37	Node Configuration File	Check whether the configuration files of key components exist on the node.
38	CoreDNS Configuration Consistency	Check whether the current CoreDNS key configuration Corefile is different from that in the Helm release record. The difference may be overwritten during the add-on upgrade, affecting domain name resolution in the cluster.
39	sudo	Check whether the sudo command and sudo-related files of the node are working.
40	Key Node Commands	Whether some key commands that the node upgrade depends on are working
41	Mounting of a Sock File on a Node	Check whether the docker/containerd.sock file is directly mounted to the pods on a node. During an upgrade, Docker or containerd restarts and the sock file on the host changes, but the sock file mounted to pods does not change accordingly. As a result, your services cannot access Docker or containerd due to sock file inconsistency. After the pods are rebuilt, the sock file is mounted to the pods again, and the issue is resolved accordingly.
42	HTTPS Load Balancer Certificate Consistency	Check whether the certificate used by an HTTPS load balancer has been modified on ELB.
43	Node Mounting	This section describes how to diagnose and fix mounting failures caused by CCE storage misconfigurations.
44	Login Permissions of User paas on a Node	Check whether user paas is allowed to log in to a node.
45	Private IPv4 Addresses of Load Balancers	Check whether the load balancer associated with a Service is allocated with a private IPv4 address.
46	Historical Upgrade Records	Check the historical upgrade records of the cluster and confirm that the current version of the cluster meets the requirements for upgrading to the target version.
47	CIDR Block of the Cluster Management Plane	Check whether the CIDR block of the cluster management plane is the same as that configured on the backbone network.
48	CCE AI Suite (NVIDIA GPU) Exceptions	Check whether CCE AI Suite (NVIDIA GPU) involved in the upgrade affects the GPU driver installation when creating a GPU node.
49	Nodes' System Parameters	Check whether the default system parameter settings on your nodes are modified.
50	Residual Package Version Data	Check whether there is residual package version data in the current cluster.
51	Node Commands	Check whether the commands required for the upgrade are available on the node.
52	Node Swap	Check whether swap has been enabled on cluster nodes.
53	NGINX Ingress Controller	Check whether there are compatibility issues that may occur during NGINX Ingress Controller upgrade.
54	containerd Pod Restart Risks	Check whether the service pods running on a containerd node are restarted when containerd is upgraded.
55	Key CCE AI Suite (NVIDIA GPU) Parameters	Check whether the configuration of CCE AI Suite (NVIDIA GPU) in a cluster has been intrusively modified. If so, upgrading the cluster may fail.
56	GPU or NPU Pod Rebuild Risks	Check whether GPU or NPU service pods are rebuilt in a cluster when kubelet is restarted during the upgrade of the cluster.
57	ELB Listener Access Control	If so, check whether the configurations are correct.
58	Subnet Quota of Master Nodes	Check whether the number of available IP addresses in the cluster subnet supports rolling upgrade.
59	Node Runtime	Check whether an alarm is generated when a cluster is upgraded to v1.27 or later. Do not use Docker in clusters of versions later than 1.27 because CCE is going to stop the support for Docker.
60	Node Pool Runtime	Check whether an alarm is generated when a cluster is upgraded to v1.27 or later. Do not use Docker in clusters of versions later than 1.27 because CCE is going to stop the support for Docker.
61	Number of Node Images	Check the number of images on your node. If there are more than 1000 images, it takes a long time for Docker to start, affecting the standard Docker output and functions such as Nginx.
62	OpenKruise Compatibility Check	Check whether the OpenKruise add-on is compatible before upgrading a cluster.
63	Compatibility Check of At-Rest Encryption for Secrets	Check whether the target version supports at-rest encryption for secrets. If it does not, clusters that have this feature enabled cannot be upgraded to the target version.
64	Compatibility Between the Ubuntu Kernel and GPU Driver	Make sure that CCE AI Suite (NVIDIA GPU) and Ubuntu nodes are compatible before using them in a cluster. If the Ubuntu kernel is 5.15.0-113-generic, the driver of the GPU add-on must be 535.161.08 or later.
65	Drainage Tasks	An unfinished drainage task is detected in the cluster, which may resume after the upgrade. If this happens, running pods will be evicted, which could impact your services.
66	Image Layers on a Node	Check the number of image layers on your node. If there are more than 5000 layers, it will take a long time for Docker or containerd to start, affecting the stdout of Docker or containerd.
67	Cluster Rolling Upgrade	Check whether your cluster is eligible for a rolling upgrade. The check results show that the rolling upgrade is not supported.
68	Rotation Certificates	Check whether the number of certificates on your node is greater than 1000. During an upgrade, certificate files will be processed in batches. An excessive number of certificate files will lead to a slow node upgrade and result in pod eviction from the node.
69	Ingress and ELB Configuration Consistency	Check whether any modifications have been made on the ELB console to the listener, forwarding policy, forwarding rule, backend server group, backend cloud server, or certificate configurations that were automatically generated for the ingress.
70	Network Policies of Cluster Network Components	Check the network policy settings on the master nodes in your cluster. If any manual modifications have been made, they will be reset during the upgrade.
71	Cluster and Node Pool Configurations	Check whether the nic-max-above-warm-target value configured for the network component of the current cluster exceeds the maximum value allowed.
72	Time Zone of Master Nodes	Check whether the time zone of the master nodes matches the cluster's time zone. If they are different, the master nodes will be updated to match the cluster's time zone during a rolling upgrade.
73	SNATIPRanges	Check whether the SNATIPRanges value has changed after the upgrade. This check is available only for CCE Turbo clusters.
74	Add-on Configuration Consistency	Manual modifications to add-on configuration parameters (typically ConfigMaps), instead of modifications through the CCE console or add-on API updates, may be overwritten after an upgrade, potentially affecting service operation.