Checklist for Deploying Containerized Applications in the Cloud
Overview
Security, efficiency, stability, and availability are common requirements on all cloud services. To meet these requirements, the system availability, data reliability, and O&M stability must be coordinated. This checklist describes the check items for deploying containerized applications on the cloud to help you efficiently migrate services to CCE, reducing potential cluster or application exceptions caused by improper use.
Check Items
Category |
Check Item |
Type |
Impact |
FAQ & Example |
---|---|---|---|---|
Cluster |
Before creating a cluster, properly plan the node network and container network based on service requirements to allow subsequent service expansion. |
Network planning |
If the subnet or container CIDR block where the cluster resides is small, the number of available nodes supported by the cluster may be less than required. |
|
Before creating a cluster, properly plan CIDR blocks for the related Direct Connect, peering connection, container network, service network, and subnet to avoid IP address conflicts. |
Network planning |
If CIDR blocks are not properly set and IP address conflicts occur, service access will be affected. |
||
When a cluster is created, the default security group is automatically created and bound to the cluster. You can set custom security group rules based on service requirements. |
Deployment |
Security groups are key to security isolation. Improper security policy configuration may cause security risks and service connectivity problems. |
||
Enable the multi-master node mode, and set the number of master nodes to 3 when creating a cluster. |
Reliability |
After the multi-master node mode is enabled, three master nodes will be created. If a master node is faulty, the cluster can still be available without affecting service functions. In commercial scenarios, it is advised to enable the multi-master node mode. |
How Do I Check Whether a Cluster Is in Multi-Master Mode? Once a cluster is created, the number of master nodes cannot be changed. Exercise caution when setting the number of master nodes. |
|
When creating a cluster, select a proper network model as needed.
|
Deployment |
After a cluster is created, the network model cannot be changed. Exercise caution when selecting a network model. |
||
Workload |
When creating a workload, set the CPU and memory limits to improve service robustness. |
Deployment |
When multiple applications are deployed on the same node, if the upper and lower resource limits are not set for an application, resource leakage occurs. As a result, resources cannot be allocated to other applications, and the application monitoring information will be inaccurate. |
None |
When creating a workload, you can set probes for container health check, including liveness probe and readiness probe. |
Reliability |
If the health check function is not configured, a pod cannot detect service exceptions or automatically restart the service to restore it. This results in a situation where the pod status is normal but the service in the pod is abnormal. |
||
When creating a workload, select a proper access mode (Service). Currently, the following types of Services are supported: ClusterIP, NodePort, DNAT, and LoadBalancer. |
Deployment |
Improper Service configuration may cause logic confusion for internal and external access and resource waste. |
||
When creating a workload, do not set the number of replicas for a single pod. Set a proper node scheduling policy based on your service requirements. |
Reliability |
For example, if the number of replicas of a single pod is set, the service will be abnormal when the node or pod is abnormal. To ensure that your pods can be successfully scheduled, ensure that the node has idle resources for container scheduling after you set the scheduling rule. |
None |
|
Properly set affinity and anti-affinity. |
Reliability |
If affinity and anti-affinity are both configured for an application that provides Services externally, Services may fail to be accessed after the application is upgraded or restarted. |
Scheduling Policy (Affinity/Anti-affinity) Negative example: For application A, nodes 1 and 2 are set as affinity nodes, and nodes 3 and 4 are set as anti-affinity nodes. Application A exposes a Service through the ELB, and the ELB listens to node 1 and node 2. When application A is upgraded, it may be scheduled to a node other than nodes 1, 2, 3, and 4, and it cannot be accessed through the Service. Cause: Scheduling of application A does not need to meet both affinity and anti-affinity policies. A node will be selected for application A according to either of the policies. In this example, the node selection is based on the anti-affinity scheduling policy. |
|
When creating a workload, set the pre-stop processing command (Lifecycle > Pre-Stop) to ensure that the services running in the pods can be completed in advance in the case of application upgrade or pod deletion. |
Reliability |
If the pre-stop processing command is not configured, the pod will be directly killed and services will be interrupted during application upgrade. |
Category |
Check Item |
Type |
Impact |
FAQ & Example |
---|---|---|---|---|
Container data persistency |
Select a proper data volume type based on service requirements. |
Reliability |
When a node is faulty and cannot be recovered, data in the local disk cannot be recovered. Therefore, you are advised to use cloud storage volumes to ensure data reliability. |
|
Backup |
Back up application data. |
Reliability |
Data cannot be restored after being lost. |
Category |
Check Item |
Type |
Impact |
FAQ & Example |
---|---|---|---|---|
Project |
The quotas of ECS, VPC, subnet, EIP, and EVS resources must meet customer requirements. |
Deployment |
If the quota is insufficient, resources will fail to be created. Specifically, users who have configured auto scaling must have sufficient resource quotas. |
|
You are not advised to modify kernel parameters, system configurations, cluster core component versions, security groups, and ELB-related parameters on cluster nodes, or install software that has not been verified. |
Deployment |
Exceptions may occur on CCE clusters or Kubernetes components on the node, making the node unavailable for application deployment. |
For details, see High-Risk Operations and Solutions. Negative example:
|
|
Do not modify information about resources created by CCE, such as security groups and EVS disks. Resources created by CCE are labeled cce. |
Deployment |
CCE cluster functions may be abnormal. |
Negative example:
All the preceding actions will cause CCE cluster functions to be abnormal. |
|
Proactive O&M |
CCE provides multi-dimensional monitoring and alarm reporting functions, allowing users to locate and rectify faults as soon as possible.
|
Monitoring |
If the alarms are not configured, the standard of container cluster performance cannot be established. When an exception occurs, you cannot receive alarms and will need to manually locate the fault. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot