High-Risk Operations and Solutions
During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks.
Clusters and Nodes
| Category | Operation | Impact | Solution |
|---|---|---|---|
| Master nodes | Modifying the security group of a node in a cluster | The master node may be unavailable. NOTE: Naming rule of a master node: Cluster name-cce-control-Random number | Restore the security group by referring to Buying a CCE Cluster and allow traffic from the security group to pass through. |
| Letting the node expire or destroying the node | The master node will be unavailable. | This operation cannot be undone. | |
| Reinstalling the OS | Components on the master node will be deleted. | This operation cannot be undone. | |
| Upgrading components on the master or etcd node | The cluster may be unavailable. | Roll back to the original version. | |
| Deleting or formatting core directory data such as /etc/kubernetes on the node | The master node will become unavailable. | This operation cannot be undone. | |
| Changing the node IP address | The master node will become unavailable. | Change the IP address back to the original one. | |
| Modifying parameters of core components (such as etcd, kube-apiserver, and docker) | The master node may be unavailable. | Restore the parameter settings to the recommended values. For details, see Configuring Kubernetes Parameters. | |
| Replacing the master or etcd certificate | The cluster may become unavailable. | This operation cannot be undone. | |
| Worker nodes | Modifying the security group of a node in a cluster | The node may be unavailable. NOTE: Naming rule of a worker node: Cluster name-cce-node-Random number | Restore the security group by referring to Buying a CCE Cluster and allow traffic from the security group to pass through. |
| Deleting the node | The node will become unavailable. | This operation cannot be undone. | |
| Reinstalling the OS | Node components are deleted, and the node becomes unavailable. | Reset the node. For details, see Resetting a Node. | |
| Upgrading the node kernel | The node may be unavailable or the network may be abnormal. NOTE: Node running depends on the system kernel version. Do not use yum update to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.) | If the OS is EulerOS 2.2, restore the node or network connectivity by referring to What Can I Do If the Container Network Becomes Unavailable After yum update Is Used to Upgrade the OS? If the OS is not EulerOS 2.2, you can reset the node. For details, see Resetting a Node. | |
| Changing the node IP address | The node will become unavailable. | Change the IP address back to the original one. | |
| Modifying parameters of core components (such as kubelet and kube-proxy) | The node may become unavailable, and components may be insecure if security-related configurations are modified. | Restore the parameter settings to the recommended values. For details, see Configuring Kubernetes Parameters. | |
| Modifying OS configuration | The node may be unavailable. | Restore the configuration items or reset the node. For details, see Resetting a Node. | |
| Deleting the opt directory, /var/paas directory, or a data disk | The node will become unready. | You can reset the node. For details, see Resetting a Node. | |
| Modifying the node directory permission and the container directory permission | The permissions will be abnormal. | You are not advised to modify the permissions. Restore the permissions if they are modified. | |
| Formatting or partitioning disks on cluster nodes | The node will become unready. | You can reset the node. For details, see Resetting a Node. | |
| Installing other software on nodes | This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable. | Uninstall the software that has been installed and restore or reset the node. For details, see Resetting a Node. |
Networking and Load Balancing
| Operation | Impact | How to Avoid/Fix |
|---|---|---|
| Changing the value of the kernel parameter net.ipv4.ip_forward to 0 | The network becomes inaccessible. | Change the value to 1. |
| Changing the value of the kernel parameter net.ipv4.tcp_tw_recycle to 1. | The NAT service becomes abnormal. | Change the value to 0. |
| Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block | The DNS in the cluster cannot work properly. | Restore the security group by referring to Buying a CCE Cluster and allow traffic from the security group to pass through. |
| Creating a custom listener on the ELB console for the load balancer managed by CCE | The modified items are reset by CCE or the ingress is faulty. | Use the YAML file of the Service to automatically create a listener. |
| Binding a user-defined backend on the ELB console to the load balancer managed by CCE. | Do not manually bind any backend. | |
| Changing the ELB certificate on the ELB console for the load balancer managed by CCE. | Use the YAML file of the ingress to automatically manage certificates. | |
| Changing the listener name on the ELB console for the ELB listener managed by CCE. | Do not change the name of the ELB listener managed by CCE. | |
| Changing the description of load balancers, listeners, and forwarding policies managed by CCE on the ELB console. | Do not modify the description of load balancers, listeners, or forwarding policies managed by CCE. | |
| Delete CRD resources of network-attachment-definitions of default-network. | The container network is disconnected, or the cluster fails to be deleted. | If the resources are deleted by mistake, use the correct configurations to create the default-network resources. |
Logs
| Operation | Impact | Solution |
|---|---|---|
| Deleting the /tmp/ccs-log-collector/pos directory on the host machine | Logs are collected repeatedly. | None |
| Deleting the /tmp/ccs-log-collector/buffer directory of the host machine | Logs are lost. | None |
EVS Disks
| Operation | Impact | Solution | Remarks |
|---|---|---|---|
| Manually unmounting an EVS disk on the console | An I/O error is reported when the pod data is being written into the disk. | Delete the mount path from the node and schedule the pod again. | The file in the pod records the location where files are to be collected. |
| Unmounting the disk mount path on the node | Pod data is written into a local disk. | Remount the corresponding path to the pod. | The buffer contains log cache files to be consumed. |
| Operating EVS disks on the node | Pod data is written into a local disk. | None | None |
Last Article: What Is Cloud Container Engine?
Next Article: Clusters
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.