High-Risk Operations
During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks.
Clusters and Nodes
Category |
Operation |
Impact |
Solution |
---|---|---|---|
Master node |
Modifying the security group of a node in a cluster
NOTE:
Naming rule of a security group: Cluster name-cce-control-Random digits |
The master node may be unavailable. |
Restore the security group by referring to "Creating a Cluster" and allow traffic from the security group to pass through. For details, see Configuring Cluster Security Group Rules. |
Letting the node expire or destroying the node |
The master node will be unavailable. |
This operation cannot be undone. |
|
Reinstalling the OS |
Components on the master node will be deleted. |
This operation cannot be undone. |
|
Upgrading components on the master or etcd node |
The cluster may be unavailable. |
Roll back to the original version. |
|
Deleting or formatting core directory data such as /etc/kubernetes on the node |
The master node will be unavailable. |
This operation cannot be undone. |
|
Changing the node IP address |
The master node will be unavailable. |
Change the IP address back to the original one. |
|
Modifying parameters of core components (such as etcd, kube-apiserver, and docker) |
The master node may be unavailable. |
Restore the parameter settings to the recommended values. For details, see Modifying Cluster Configurations. |
|
Replacing the master or etcd certificate |
The cluster may be unavailable. |
This operation cannot be undone. |
|
Worker node |
Modifying the security group of a node in a cluster
NOTE:
Naming rule of a security group: Cluster name-cce-node-Random digits |
The node may be unavailable. |
Restore the security group and allow traffic from the security group to pass through. For details, see Configuring Cluster Security Group Rules. |
Modifying the DNS configuration (/etc/resolv.conf) of a node |
Internal domain names cannot be accessed, which may lead to errors in functions such as add-on errors or errors in in-place node upgrade.
NOTE:
If your service needs to use an on-premises DNS, configure the DNS in the workload. Do not change node's DNS address. For details, see DNS Configuration. |
Restore the DNS configuration based on the DNS configuration of a new node. |
|
Deleting the node |
The node will become unavailable. |
This operation cannot be undone. |
|
Reinstalling the OS |
Node components are deleted, and the node becomes unavailable. |
Reset the node. For details, see Resetting a Node. |
|
Upgrading the kernel or components on which the container platform depends (such as Open vSwitch, IPVLAN, Docker, and containerd) |
The node may be unavailable or the network may be abnormal.
NOTE:
Node running depends on the system kernel version. Do not use the yum update command to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.) |
If the OS is EulerOS 2.2, restore the node or network connectivity by referring to What Can I Do If the Container Network Becomes Unavailable After yum update Is Used to Upgrade the OS? If the OS is not EulerOS 2.2, you can reset the node. For details, see Resetting a Node. |
|
Changing the node IP address |
The node will become unavailable. |
Change the IP address back to the original one. |
|
Modifying parameters of core components (such as kubelet and kube-proxy) |
The node may become unavailable, and components may be insecure if security-related configurations are modified. |
Restore the parameter settings to the recommended values. For details, see Modifying Node Pool Configurations. |
|
Modifying OS configuration |
The node may be unavailable. |
Restore the configuration items or reset the node. For details, see Resetting a Node. |
|
Deleting or modifying the /opt/cloud/cce and /var/paas directories, and deleting the data disk |
The node will become unavailable. |
Reset the node. For details, see Resetting a Node. |
|
Modifying the node directory permission and the container directory permission |
The permissions will be abnormal. |
Do not modify the permissions. Restore the permissions if they have been modified. |
|
Formatting or partitioning system disks, Docker disks, and kubelet disks on nodes. |
The node may be unavailable. |
Reset the node. For details, see Resetting a Node. |
|
Installing other software on nodes |
This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable. |
Uninstall the software that has been installed and restore or reset the node. For details, see Resetting a Node. |
|
Modifying NetworkManager configurations |
The node will become unavailable. |
Reset the node. For details, see Resetting a Node. |
|
Deleting system images such as cce-pause from the node |
Containers cannot be created and system images cannot be pulled. |
Copy the image from a functional node for restoration. |
|
Changing the flavor of a node in a node pool on the ECS console |
If a node flavor is different from the flavor specified in the node pool where the node resides, the increased number of nodes in a node pool scale-out is different from the expected number. |
Change the node flavor to the one specified in the node pool, or delete the node and perform a node pool scale-out again. |
Network
Operation |
Impact |
Solution |
---|---|---|
Changing the value of the kernel parameter net.ipv4.ip_forward to 0 |
The network becomes inaccessible. |
Change the value to 1. |
Changing the value of the kernel parameter net.ipv4.tcp_tw_recycle to 1 |
The NAT service becomes abnormal. |
Change the value to 0. |
Changing the value of the kernel parameter net.ipv4.tcp_tw_reuse to 1 |
The network becomes abnormal. |
Change the value to 0. |
Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block |
The DNS in the cluster cannot work properly. |
Restore the security group by referring to Buying a CCE Standard/Turbo Cluster and allow traffic from the security group to pass through. |
Deleting CRD resources of network-attachment-definitions of default-network |
The container network is disconnected, or the cluster fails to be deleted. |
If the resources are deleted by mistake, use the correct configurations to create the default-network resources. |
Enabling the iptables firewall |
By default, the iptables firewall is disabled on CCE. Enabling the firewall can leave the network inaccessible.
NOTE:
Do not enable the iptables firewall. If the iptables firewall must be enabled, check whether the rules configured in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables in the test environment will affect the network. |
Disable the iptables firewall and check the rules configured in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables. |
Containers
Operation |
Impact |
Solution |
---|---|---|
Configuring privileged containers for a workload and directly operating the host hardware, which are prone to misoperations on the system files of the node For example, if you set the startup command to /usr/sbin/init and run systemctl in containers, the system files located in the /lib directory of the node may be damaged. |
All mount points of the node will be unmounted. As a result, the node will be malfunctional, resulting in failed pods and affected storage add-on functions. |
Do not remove the mount points in the /lib directory of a node. Reset the node for recovery. For details, see Resetting a Node. |
Load Balancing
Operation |
Impact |
Solution |
---|---|---|
Deleting a load balancer that has been bound to a CCE cluster on the ELB console |
Accessing the target Service or ingress will fail. |
Do not delete such a load balancer. |
Disabling a load balancer that has been bound to a CCE cluster on the ELB console |
Accessing the target Service or ingress will fail. |
Do not disable such a load balancer. If a load balancer has been disabled, enable it. |
Changing the private IPv4 address of a load balancer on the ELB console |
|
Do not change private IPv4 addresses of load balancers. Change them back if they have been changed. |
Unbinding the IPv4 EIP from a load balancer on the ELB console |
After the EIP is unbound from the load balancer, the load balancer will not be able to forward Internet traffic. |
Restore the EIP binding. |
Creating a custom listener on the ELB console for the load balancer managed by CCE |
If a load balancer is automatically created when a Service or an ingress is created, the custom listener of the load balancer cannot be deleted when the Service or ingress is deleted. In this case, the load balancer cannot be automatically deleted. |
Use the listener automatically created when a Service or an ingress is created. If a custom listener is used, manually delete the target load balancer. |
Deleting a listener automatically created by CCE on the ELB console |
|
Re-create or update the Service or ingress. |
Modifying the basic configurations such as the name, access control, timeout, or description of a listener created by CCE on the ELB console |
After master nodes are restarted, for example, due to a cluster upgrade, all your modifications will be reset by CCE if the listener is deleted. |
Do not modify the basic configurations of the listener created by CCE. Restore the configurations if they have been modified. |
Modifying the backend server group of a listener created by CCE on the ELB console, including adding or deleting backend servers to or from the server group |
|
Re-create or update the Service or ingress. |
Replacing the backend server group of a listener created by CCE on the ELB console |
|
Re-create or update the Service or ingress. |
Modifying the forwarding policy of a listener created by CCE on the ELB console, including adding or deleting forwarding rules |
|
Do not modify the forwarding policy of such a listener. Restore the configurations if they have been modified. |
Changing the ELB certificate on the ELB console for a load balancer managed by CCE |
After master nodes are restarted, for example, due to a cluster upgrade, all servers in the backend server group will be reset by CCE. |
Use the YAML file of the ingress to automatically manage certificates. |
Logs
Operation |
Impact |
Solution |
---|---|---|
Deleting the /tmp/ccs-log-collector/pos directory on the host machine |
Logs are collected repeatedly. |
None |
Deleting the /tmp/ccs-log-collector/buffer directory on the host machine |
Logs are lost. |
None |
EVS Disks
Operation |
Impact |
Solution |
Remarks |
---|---|---|---|
Manually unmounting an EVS disk on the console |
An I/O error occurs when data is written into a pod. |
Delete the mount path from the node and schedule the pod again. |
The file in the pod records the location where files are to be collected. |
Unmounting the disk mount path on the node |
Pod data is written into a local disk. |
Remount the corresponding path to the pod. |
The buffer contains log cache files to be consumed. |
Operating EVS disks on the node |
Pod data is written into a local disk. |
None |
None |
Add-ons
Operation |
Impact |
Solution |
---|---|---|
Modifying add-on resources on the backend |
The add-on becomes malfunctional or other unexpected issues occur. |
Perform operations on the add-on configuration page or using open add-on management APIs. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot