High-Risk Operations and Solutions
During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks.
Clusters and Nodes
Category |
Operation |
Impact |
Solution |
---|---|---|---|
Master nodes |
Modifying the security group of a node in a cluster |
The master node may be unavailable.
NOTE:
Naming rule of a master node: Cluster name-cce-control-Random number |
Restore the security group by referring to Creating a Cluster and allow traffic from the security group to pass through. |
Letting the node expire or destroying the node |
The master node will be unavailable. |
This operation cannot be undone. |
|
Reinstalling the OS |
Components on the master node will be deleted. |
This operation cannot be undone. |
|
Upgrading components on the master or etcd node |
The cluster may be unavailable. |
Roll back to the original version. |
|
Deleting or formatting core directory data such as /etc/kubernetes on the node |
The master node will become unavailable. |
This operation cannot be undone. |
|
Changing the node IP address |
The master node will become unavailable. |
Change the IP address back to the original one. |
|
Modifying parameters of core components (such as etcd, kube-apiserver, and docker) |
The master node may be unavailable. |
Restore the parameter settings to the recommended values. For details, see Configuring Kubernetes Parameters. |
|
Replacing the master or etcd certificate |
The cluster may become unavailable. |
This operation cannot be undone. |
|
Worker nodes |
Modifying the security group of a node in a cluster |
The node may be unavailable.
NOTE:
Naming rule of a worker node: Cluster name-cce-node-Random number |
Restore the security group by referring to Creating a Cluster and allow traffic from the security group to pass through. |
Deleting the node |
The node will become unavailable. |
This operation cannot be undone. |
|
Reinstalling the OS |
Node components are deleted, and the node becomes unavailable. |
Reset the node. For details, see Resetting a Node. |
|
Upgrading the node kernel |
The node may be unavailable or the network may be abnormal.
NOTE:
Node running depends on the system kernel version. Do not use yum update to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.) |
Reset the node. For details, see Resetting a Node. |
|
Changing the node IP address |
The node will become unavailable. |
Change the IP address back to the original one. |
|
Modifying parameters of core components (such as kubelet and kube-proxy) |
The node may become unavailable, and components may be insecure if security-related configurations are modified. |
Restore the parameter settings to the recommended values. For details, see Configuring Kubernetes Parameters. |
|
Modifying OS configuration |
The node may be unavailable. |
Restore the configuration items or reset the node. For details, see Resetting a Node. |
|
Deleting the opt directory, /var/paas directory, or a data disk |
The node will become unready. |
Reset the node. For details, see Resetting a Node. |
|
Modifying the node directory permission and the container directory permission |
The permissions will be abnormal. |
You are not advised to modify the permissions. Restore the permissions if they are modified. |
|
Formatting or partitioning disks on cluster nodes |
The node will become unready. |
Reset the node. For details, see Resetting a Node. |
|
Installing other software on nodes |
This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable. |
Uninstall the software that has been installed and restore or reset the node. For details, see Resetting a Node. |
Networking and Load Balancing
Operation |
Impact |
How to Avoid/Fix |
---|---|---|
Changing the value of the kernel parameter net.ipv4.ip_forward to 0 |
The network becomes inaccessible. |
Change the value to 1. |
Changing the value of the kernel parameter net.ipv4.tcp_tw_recycle to 1. |
The NAT service becomes abnormal. |
Change the value to 0. |
Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block |
The DNS in the cluster cannot work properly. |
Restore the security group by referring to Creating a Cluster and allow traffic from the security group to pass through. |
Creating a custom listener on the ELB console for the load balancer managed by CCE |
The modified items are reset by CCE or the ingress is faulty. |
Use the YAML file of the Service to automatically create a listener. |
Binding a user-defined backend on the ELB console to the load balancer managed by CCE. |
Do not manually bind any backend. |
|
Changing the ELB certificate on the ELB console for the load balancer managed by CCE. |
Use the YAML file of the ingress to automatically manage certificates. |
|
Changing the listener name on the ELB console for the ELB listener managed by CCE. |
Do not change the name of the ELB listener managed by CCE. |
|
Changing the description of load balancers, listeners, and forwarding policies managed by CCE on the ELB console. |
Do not modify the description of load balancers, listeners, or forwarding policies managed by CCE. |
|
Delete CRD resources of network-attachment-definitions of default-network. |
The container network is disconnected, or the cluster fails to be deleted. |
If the resources are deleted by mistake, use the correct configurations to create the default-network resources. |
Logs
Operation |
Impact |
Solution |
---|---|---|
Deleting the /tmp/ccs-log-collector/pos directory on the host machine |
Logs are collected repeatedly. |
None |
Deleting the /tmp/ccs-log-collector/buffer directory of the host machine |
Logs are lost. |
None |
EVS Disks
Operation |
Impact |
Solution |
Remarks |
---|---|---|---|
Manually unmounting an EVS disk on the console |
An I/O error is reported when the pod data is being written into the disk. |
Delete the mount path from the node and schedule the pod again. |
The file in the pod records the location where files are to be collected. |
Unmounting the disk mount path on the node |
Pod data is written into a local disk. |
Remount the corresponding path to the pod. |
The buffer contains log cache files to be consumed. |
Operating EVS disks on the node |
Pod data is written into a local disk. |
None |
None |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.