Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Container Engine/ User Guide (Paris Regions)/ High-Risk Operations and Solutions

High-Risk Operations and Solutions

Updated on 2024-01-26 GMT+08:00

During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks.

Clusters and Nodes

Table 1 High-risk operations and solutions

Category

Operation

Impact

Solution

Master node

Modifying the security group of a node in a cluster

The master node may be unavailable.

NOTE:

Naming rule of a master node: Cluster name-cce-control-Random number

Restore the security group by referring to "Creating a Cluster" and allow traffic from the security group to pass through.

Letting the node expire or destroying the node

The master node will be unavailable.

This operation cannot be undone.

Reinstalling the OS

Components on the master node will be deleted.

This operation cannot be undone.

Upgrading components on the master or etcd node

The cluster may be unavailable.

Roll back to the original version.

Deleting or formatting core directory data such as /etc/kubernetes on the node

The master node will be unavailable.

This operation cannot be undone.

Changing the node IP address

The master node will be unavailable.

Change the IP address back to the original one.

Modifying parameters of core components (such as etcd, kube-apiserver, and docker)

The master node may be unavailable.

Restore the parameter settings to the recommended values. For details, see Cluster Configuration Management.

Replacing the master or etcd certificate

The cluster may be unavailable.

This operation cannot be undone.

Worker node

Modifying the security group of a node in a cluster

The node may be unavailable.

NOTE:

Naming rule of a worker node: Cluster name-cce-node-Random number

Restore the security group and allow traffic from the security group to pass through.

Deleting the node

The node will become unavailable.

This operation cannot be undone.

Reinstalling the OS

Node components are deleted, and the node becomes unavailable.

Reset the node. For details, see Resetting a Node.

Upgrading the node kernel

The node may be unavailable or the network may be abnormal.

NOTE:

Node running depends on the system kernel version. Do not use the yum update command to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.)

For details, see Resetting a Node.

Changing the node IP address

The node will become unavailable.

Change the IP address back to the original one.

Modifying parameters of core components (such as kubelet and kube-proxy)

The node may become unavailable, and components may be insecure if security-related configurations are modified.

Restore the parameter settings to the recommended values. For details, see Configuring a Node Pool.

Modifying OS configuration

The node may be unavailable.

Restore the configuration items or reset the node. For details, see Resetting a Node.

Deleting or modifying the /opt/cloud/cce and /var/paas directories, and deleting the data disk

The node will become unready.

Reset the node. For details, see Resetting a Node.

Modifying the node directory permission and the container directory permission

The permissions will be abnormal.

You are not advised to modify the permissions. Restore the permissions if they are modified.

Formatting or partitioning system disks, Docker disks, and kubelet disks on nodes.

The node may be unavailable.

Reset the node. For details, see Resetting a Node.

Installing other software on nodes

This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable.

Uninstall the software that has been installed and restore or reset the node. For details, see Resetting a Node.

Modifying NetworkManager configurations

The node will become unavailable.

Reset the node. For details, see Resetting a Node.

Delete system images such as cce-pause from the node.

Containers cannot be created and system images cannot be pulled.

Copy the image from another normal node for restoration.

Networking

Table 2 High-risk operations and solutions

Operation

Impact

Solution

Changing the value of the kernel parameter net.ipv4.ip_forward to 0

The network becomes inaccessible.

Change the value to 1.

Changing the value of the kernel parameter net.ipv4.tcp_tw_recycle to 1

The NAT service becomes abnormal.

Change the value to 0.

Changing the value of the kernel parameter net.ipv4.tcp_tw_reuse to 1

The network becomes abnormal.

Change the value to 0.

Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block

The DNS in the cluster cannot work properly.

Restore the security group by referring to Creating a Cluster and allow traffic from the security group to pass through.

Delete CRD resources of network-attachment-definitions of default-network.

The container network is disconnected, or the cluster fails to be deleted.

If the resources are deleted by mistake, use the correct configurations to create the default-network resources.

Load Balancing

Table 3 Service ELB

Operation

Impact

Solution

Changing the private IPv4 address of a load balancer on the ELB console

  • The network traffic forwarded using the private IPv4 addresses will be interrupted.
  • The IP address in the status field of the Service/ingress YAML file is changed.

You are not advised to modify the permissions. Restore the permissions if they are modified.

Unbinding the IPv4 EIP from a load balancer on the ELB console

After the EIP is unbound from the load balancer, the load balancer will not be able to forward Internet traffic.

Restore the EIP binding.

Creating a custom listener on the ELB console for the load balancer managed by CCE

If a load balancer is automatically created when a Service or an ingress is created, the custom listener of the load balancer cannot be deleted when the Service or ingress is deleted. In this case, the load balancer cannot be automatically deleted.

Use the listener automatically created through a Service or an ingress. If a custom listener is used, manually delete the target load balancer.

Deleting a listener automatically created by CCE on the ELB console

  • Service/Ingress access fails.
  • After the master nodes are restarted, for example, due to a cluster upgrade, all your modifications will be reset by CCE.

Re-create or update the Service or ingress.

Modifying the basic configurations such as the name, access control, timeout, or description of a listener created by CCE on the ELB console

After the master nodes are restarted, for example, due to a cluster upgrade, all your modifications will be reset by CCE if the listener is deleted.

You are not advised to modify the permissions. Restore the permissions if they are modified.

Modifying the backend server group of a listener created by CCE on the ELB console, including adding or deleting backend servers to or from the server group

  • Service/Ingress access fails.
  • After the master nodes are restarted, for example, due to a cluster upgrade, all your modifications will be reset by CCE.
    • The deleted backend server will be restored.
    • The added backend server will be removed.

Re-create or update the Service or ingress.

Replacing the backend server group of a listener created by CCE on the ELB console

  • Service/Ingress access fails.
  • After the master nodes are restarted, for example, due to a cluster upgrade, all servers in the backend server group will be reset by CCE.

Re-create or update the Service or ingress.

Modifying the forwarding policy of a listener created by CCE on the ELB console, including adding or deleting a forwarding rule

  • Service/Ingress access fails.
  • After the master nodes are restarted, for example, due to a cluster upgrade, all your modifications will be reset by CCE if the forwarding rule is added by the ingress.

You are not advised to modify the permissions. Restore the permissions if they are modified.

Changing the ELB certificate on the ELB console for the load balancer managed by CCE

After the master nodes are restarted, for example, due to a cluster upgrade, all servers in the backend server group will be reset by CCE.

Use the YAML file of the ingress to automatically manage certificates.

Logs

Table 4 High-risk operations and solutions

Operation

Impact

Solution

Deleting the /tmp/ccs-log-collector/pos directory on the host machine

Logs are collected repeatedly.

None

Deleting the /tmp/ccs-log-collector/buffer directory on the host machine

Logs are lost.

None

EVS Disks

Table 5 High-risk operations and solutions

Operation

Impact

Solution

Remarks

Manually unmounting an EVS disk on the console

An I/O error occurs when data is written into a pod.

Delete the mount path from the node and schedule the pod again.

The file in the pod records the location where files are to be collected.

Unmounting the disk mount path on the node

Pod data is written into a local disk.

Remount the corresponding path to the pod.

The buffer contains log cache files to be consumed.

Operating EVS disks on the node

Pod data is written into a local disk.

None

None

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback