Help Center> Cloud Container Engine> Best Practices> Checklist for Deploying Containerized Applications in the Cloud
Updated on 2022-12-01 GMT+08:00

Checklist for Deploying Containerized Applications in the Cloud

Overview

Security, efficiency, stability, and availability are common requirements on all cloud services. To meet these requirements, the system availability, data reliability, and O&M stability must be perfectly coordinated. This checklist describes the check items for deploying containerized applications on the cloud to help you efficiently migrate services to CCE, reducing potential cluster or application exceptions caused by improper use.

Check Items

Table 1 System availability

Category

Check Item

Type

Impact

FAQ & Example

Cluster

Before creating a cluster, properly plan the node network and container network based on service requirements to allow subsequent service expansion.

Network planning

If the subnet or container CIDR block where the cluster resides is small, the number of available nodes supported by the cluster may be less than required.

Before creating a cluster, properly plan CIDR blocks for the related Direct Connect, peering connection, container network, service network, and subnet to avoid IP address conflicts.

Network planning

If CIDR blocks are not properly set and IP address conflicts occur, service access will be affected.

When a cluster is created, the default security group is automatically created and bound to the cluster. You can set custom security group rules based on service requirements.

Deployment

Security groups are key to security isolation. Improper security policy configuration may cause security risks and service connectivity problems.

Enable the multi-master node mode, and set the number of master nodes to 3 when creating a cluster.

Reliability

After the multi-master node mode is enabled, three master nodes will be created. If a master node is faulty, the cluster can still be available without affecting service functions. In commercial scenarios, it is advised to enable the multi-master node mode.

How Do I Check Whether a Cluster Is an HA Cluster?

Once a cluster is created, the number of master nodes cannot be changed. Exercise caution when setting the number of master nodes.

When creating a cluster, select a proper network model, such as container tunnel network or VPC network.

Deployment

After a cluster is created, the network model cannot be changed. Exercise caution when selecting a network model.

Container Network Model Comparison

Workload

When creating a workload, you need to set the CPU and memory limits to improve service robustness.

Deployment

When multiple applications are deployed on the same node, if the upper and lower resource limits are not set for an application, resource leakage occurs. As a result, resources cannot be allocated to other applications, and the application monitoring information will be inaccurate.

-

When creating a workload, you can set probes for container health check, including liveness probe and readiness probe.

Reliability

If the health check function is not configured, a pod cannot detect service exceptions or automatically restart the service to restore it. This results in a situation where the pod status is normal but the service in the pod is abnormal.

When creating a workload, select a proper access mode (Service). Currently, the following types of Services are supported: ClusterIP, NodePort, DNAT, and LoadBalancer.

Deployment

Improper Service configuration may cause logic confusion for internal and external access and resource waste.

When creating a workload, do not set the number of replicas for a single pod. Set a proper node scheduling policy based on your service requirements.

Reliability

For example, if the number of replicas of a single pod is set, the service will be abnormal when the node or pod is abnormal. To ensure that your pods can be successfully scheduled, ensure that the node has idle resources for container scheduling after you set the scheduling rule.

-

Properly set affinity and anti-affinity.

Reliability

If affinity and anti-affinity are both configured for an application that provides Services externally, Services may fail to be accessed after the application is upgraded or restarted.

Scheduling Policy Overview

Negative example:

For application A, nodes 1 and 2 are set as affinity nodes, and nodes 3 and 4 are set as anti-affinity nodes. Application A exposes a Service through the ELB, and the ELB listens to node 1 and node 2. When application A is upgraded, it may be scheduled to a node other than nodes 1, 2, 3, and 4, and it cannot be accessed through the Service.

Cause:

Scheduling of application A does not need to meet both affinity and anti-affinity policies. A node will be selected for application A according to either of the policies. In this example, the node selection is based on the anti-affinity scheduling policy.

When creating a workload, set the pre-stop processing command (Lifecycle > Pre-Stop) to ensure that the services running in the pods can be completed in advance in the case of application upgrade or pod deletion.

Reliability

If the pre-stop processing command is not configured, the pod will be directly killed and services will be interrupted during application upgrade.

Table 2 Data reliability

Category

Check Item

Type

Impact

FAQ & Example

Container data persistency

Select a proper data volume type based on service requirements.

Reliability

When a node is faulty and cannot be recovered, data in the local disk cannot be recovered. Therefore, you are advised to use cloud storage volumes to ensure data reliability.

Backup

Back up application data.

Reliability

Data cannot be restored after being lost.

What Storage Classes Does CCE Support? What Are the Differences Between These Storage Classes?

Table 3 O&M reliability

Category

Check Item

Type

Impact

FAQ & Example

Project

The quotas of ECS, VPC, subnet, EIP, and EVS resources must meet customer requirements.

Deployment

If the quota is insufficient, resources will fail to be created. Specifically, users who have configured auto scaling must have sufficient resource quotas.

You are not advised to modify kernel parameters, system configurations, cluster core component versions, security groups, and ELB-related parameters on cluster nodes, or install software that has not been verified.

Deployment

Exceptions may occur on CCE clusters or Kubernetes components on the node, making the node unavailable for application deployment.

For details, see High-Risk Operations and Solutions.

Negative example:

  1. The container network is interrupted after the node kernel is upgraded.
  2. The container network is interrupted after an open-source Kubernetes network add-on is installed on a node.
  3. The /var/paas or /mnt/paas/kubernetes directory is deleted from a node, which causes exceptions on the node.

Do not modify information about resources created by CCE, such as security groups and EVS disks. Resources created by CCE are labeled cce.

Deployment

CCE cluster functions may be abnormal.

Negative example:

  1. On the ELB console, a user changes the name of the listener created by CCE.
  2. On the VPC console, a user modifies the security group created by CCE.
  3. On the EVS console, a user deletes or uninstalls data disks mounted to CCE cluster nodes.
  4. On the IAM console, a user deletes cce_admin_trust.

All the preceding actions will cause CCE cluster functions to be abnormal.

Proactive O&M

CCE provides multi-dimensional monitoring and alarm reporting functions, and supports basic resource monitoring based on fine-grained metrics by interconnecting with Application Operations Management (AOM). Alarms allow users to locate and rectify faults as soon as possible.

Monitoring

If the alarms are not configured, the standard of container cluster performance cannot be established. When an exception occurs, you cannot receive alarms and will need to manually locate the fault.

Monitoring Overview