Recommended Configurations for Cluster HA
This section describes the recommended configurations for a Kubernetes cluster in which applications can run stably and reliably.
Item |
Description |
Recommended Operations |
---|---|---|
Master node |
CCE is a hosted Kubernetes cluster service. You do not need to perform O&M on the master nodes. You can configure your cluster specifications to improve the stability and reliability. |
|
Worker node |
In a Kubernetes cluster, the data plane consists of worker nodes that can run containerized applications and transmit network traffic. When using CCE, perform O&M on worker nodes by yourself. To achieve HA, ensure the worker nodes' scalability and repairability and pay attention to the running statuses of the worker nodes' key components. |
|
Application |
If you want your applications to be always available, especially during peak hours, run them in a scalable and elastic manner and pay attention to their running statuses. |
Deploying the Master Nodes in Different AZs
Multiple regions are provided for you to deploy your services, and there are different availability zones (AZs) in each region. An AZ is a collection of one or more physical data centers with independent cooling, fire extinguishing, moisture-proof, and electricity facilities in each AZ. AZs within a region are connected using high-speed optical fibers. This allows you to build cross-AZ HA systems.
When creating a cluster, enable the HA mode of the cluster and configure the distribution mode of the master nodes. The master nodes are randomly deployed in different AZs. This ensures a higher disaster recovery (DR) capability of the cluster.
- Random: Master nodes are deployed in different AZs for DR.
- Custom: Master nodes are deployed in specific AZs.
- Host: Master nodes are deployed on different hosts in the same AZ.
- Custom: Master nodes are deployed in the AZ you specify.
Selecting a Network Model
- Network model: CCE supports VPC network and container tunnel network models for your clusters. Different models have different performance and functions. For details, see Network Models.
- VPC network: To enable your applications to access other cloud services like RDS, create related services in the same VPC network as your cluster which runs these applications. This is because services using different VPC networks are isolated from each other. If you have created instances, use VPC peering to enable communication between VPCs.
- Container CIDR block: Do not configure a small container CIDR block. Otherwise, the number of supported nodes will be limited.
- For a cluster using a VPC network, if the subnet mask of the container CIDR block is /16, there are 256 x 256 IP addresses available. If the maximum number of pods reserved on each node is 128, the maximum number of nodes supported is 512.
- For a cluster using a container tunnel network, if the subnet mask of the container CIDR block is /16, there are 256 x 256 IP addresses assigned to your cluster. The container CIDR block allocates 16 IP addresses to the nodes at a time by default. The maximum number of nodes supported by your cluster is 4096 (65536/16=4096).
- Service CIDR block: The service CIDR block determines the upper limit of Service resources in your cluster. Evaluate your actual needs and then configure the CIDR block. A created CIDR block cannot be modified. Do not configure an excessively small one.
For details, see Planning CIDR Blocks for a Cluster.
Selecting a Service Forwarding Mode
kube-proxy is a key component of a Kubernetes cluster. It is responsible for load balancing and forwarding between a Service and its backend pod. When using clusters, consider the potential performance problems of the forwarding mode.
CCE supports the iptables and IPVS forwarding modes.
- IPVS allows higher throughput and faster forwarding. It applies to scenarios where the cluster scale is large or the number of Services is large.
- iptables is the traditional kube-proxy mode. This mode applies to the scenario where the number of Services is small or there are a large number of short concurrent connections on the client. When there are more than 1000 Services in the cluster, network delay may occur.
Configuring Quotas and Limits for the Cloud Service Resources and Resources in a Cluster
CCE allows you to configure resource quotas and limits for your cloud service resources and resources in your clusters. This prevents excessive use of resources. When creating your applications for CCE clusters, consider these limits and periodically review them. This will avoid scaling failures caused by insufficient quotas during application running.
- Configuring resource quotas for cloud services: Cloud services like ECS, EVS, VPC, ELB, and SWR are also used to run the CCE clusters. If the existing resource quotas cannot meet your requirements, submit a service ticket to increase the quotas.
- Configuring resource quotas for a cluster: You are allowed to configure the namespace-level resource quotas to limit the number of objects of a certain type created in a namespace and the total computing resources like CPU and memory consumed by the objects.
Monitoring Metrics of the Master Nodes
Monitoring metrics of the master nodes allows you to check the master nodes' performance and efficiently identify problems occurred on them. The master nodes which are not running properly may lower application reliability.
CCE allows you to monitor kube-apiserver, kube-controller, kube-scheduler, and etcd-server on the master nodes with the Cloud Native Cluster Monitoring add-on installed. With grafana, you can use the Kubernetes monitoring overview dashboard to monitor metrics of Kubernetes API server requests and latency and etcd latency.
If an on-premises Prometheus instance is used, you can manually add monitoring metrics.
Partitioning Data Disks Attached to a Node
By default, the first data disk of a worker node is for storing the container runtime and kubelet components. The remaining capacity of this data disk affects image download and container startup and running.
The default space of this date disk is 100 GiB. You can adjust the space as required. Images, system logs, and application logs are stored on data disks. Therefore, you need to evaluate the number of pods to be deployed on each node, the size of logs, images, and temporary data of each pod, as well as some reserved space for the system. For details, see Selecting a Data Disk for the Node.
Running npd
A failure in a worker node may affect the availability of the applications. CCE Node Problem Detector is used to monitor node exceptions. It helps you detect and handle latent exceptions in a timely manner. You can also customize the check items, including target node, check period, and triggering threshold.
Configuring the DNS Cache
When the number of DNS requests in a cluster increases, the load of CoreDNS increases and the following issues may occur:
- Increased delay: CoreDNS needs to process more requests, which may slow down the DNS query and affect service performance.
- Increased resource usage: To ensure DNS performance, CoreDNS requires higher specifications.
To minimize the impact of DNS delay, deploy NodeLocal DNSCache in the cluster to improve the networking stability and performance. NodeLocal DNSCache runs a DNS cache proxy on cluster nodes. All pods with DNS configurations use the DNS cache proxy running on nodes instead of the CoreDNS service for domain name resolution. This reduces CoreDNS' load and improves the cluster DNS performance.
You can install the NodeLocal DNSCache add-on.
Properly Deploying CoreDNS
Deploy the CoreDNS instances in different AZs and nodes to mitigate the single-node or single-AZ faults.
Ensure that the CPU and memory of the node where CoreDNS is running are not fully used. Otherwise, the Queries per second (QPS) and response of domain name resolution will be affected.
Running Multiple Pods
If your application runs in one pod, the application will be unavailable if the pod is abnormal. Use Deployments or other types of replicas to deploy your applications. Each time a pod fails or is terminated, the controller automatically restarts a new pod that has the same specifications as the original one to ensure that a specified number of pods are always running in the cluster.
When creating a workload, set the number of instances to a value greater than 2. If an instance is faulty, the remaining instances still run until Kubernetes automatically creates another pod to compensate for the loss. You can also use HPA and CA (Using HPA and CA for Auto Scaling of Workloads and Nodes) to automatically scale in or out the workloads as required.
Using Containers to Isolate Processes
Containers provide process-level isolation. Each container has its own file system, network, and resource allocation. This prevents interference between different processes and avoids attacks and data leakage from malicious processes. Using containers to isolate processes can improve the reliability, security, and portability of applications.
If several processes work together, create multiple containers in a pod so that they can share the same network, PV, and other resources. Taking the init container as an example. The init containers run before the main containers are started to complete some initialization tasks like configuring environment variables, loading databases or data stores, and pulling Git repositories.
Note that multiple containers in a pod share the lifecycle of this pod. Therefore, if one container is abnormal, the entire pod will be restarted.
Configuring Resource Quotas for a Workload
Configure and adjust resource requests and limits for all workloads.
If too many pods are scheduled to one node, the node will be overloaded and unable to provide services.
To avoid this problem, when deploying a pod, specify the request and limit resources required by the pod. Kubernetes then selects a node with sufficient idle resources for this pod. In the following example, the Nginx pod requires 1-core CPU and 1024 MiB memory. The actual usage cannot exceed 2-core CPU and 4096 MiB memory.
Kubernetes statically schedules resources. The remaining resources on each node are calculated as follows: Remaining resources on a node = Total resources on the node – Allocated resources (not resources in use). If you manually run a resource-consuming process, Kubernetes cannot detect it.
Additionally, the resource usage must be claimed for all pods. For a pod that does not claim the resource usage, after it is scheduled to a node, Kubernetes does not deduct the resources used by this pod from the node on which it is running. Other pods may still be scheduled to this node.
Deploying an Application in Multiple AZs
You can run pods on nodes in multiple AZs to prevent an application from being affected by faults of a single AZ.
When creating a node, manually specify an AZ for the node.
During application deployment, configure anti-affinity policies for pods so that the scheduler can schedule pods across multiple AZs. For details, see Implementing High Availability for Applications in CCE. The following is an example:
apiVersion: apps/v1 kind: Deployment metadata: name: web-server labels: app: web-server spec: replicas: 4 selector: matchLabels: app: web-server template: metadata: labels: app: web-server spec: containers: - name: web-app image: nginx imagePullSecrets: - name: default-secret affinity: podAntiAffinity: # Workload anti-affinity preferredDuringSchedulingIgnoredDuringExecution: # Indicates that the rule is met as much as possible. Otherwise, scheduling cannot be performed when the number of pods exceeds the number of AZs. - podAffinityTerm: labelSelector: # Pod label matching rule. Configure anti-affinity policies between pods and their own labels. matchExpressions: - key: app operator: In values: - web-server topologyKey: topology.kubernetes.io/zone # Topology domain of the AZ where the node is located weight: 100
You can also use Pod Topology Spread Constraints to deploy pods in multiple AZs.
Deploying an Add-on in Multiple AZs
The Deployment pods of CCE system add-ons like CoreDNS and Everest can be deployed in multiple AZs, the same way as deploying an application. This function can satisfy different user requirements.
Mode |
Configuration Description |
Usage Description |
Recommended Configuration Scenario |
---|---|---|---|
Preferred |
Add-on pods will have labels with the key topology.kubernetes.io/zone for soft anti-affinity deployment, and the anti-affinity type is preferredDuringSchedulingIgnoredDuringExecution. |
Add-on pods will be preferentially scheduled to nodes in different AZs. If resources in some AZs are insufficient, some add-on pods may be scheduled to the same AZ which has sufficient resources. |
No mandatory requirements for multi-AZ DR |
Required |
Add-on pods will have labels with the key topology.kubernetes.io/zone for hard anti-affinity deployment, and the anti-affinity type is requiredDuringSchedulingIgnoredDuringExecution. |
A maximum of one pod of the same add-on can be deployed in each AZ. The number of running pods cannot exceed the number of AZs in the cluster. If the node where the add-on pod runs is faulty, pods running on the faulty node cannot be automatically migrated to other nodes in the same AZ. |
Changing number of AZs (This mode is used to prevent all pods from being scheduled to the node in the current AZ in advance.) |
Equivalent mode |
Add-on pods will have labels with the key topology.kubernetes.io/zone for configuring topology spread constraints. The pod difference between different topology domains cannot exceed 1 for add-on pods to be evenly distributed in different AZs. |
The effect of this mode is between that of the preferred mode and that of the required mode. In the equivalent mode, add-on pods can be deployed in different AZs. Additionally, multiple pods can be deployed in a single AZ when there are more pods than AZs. To use this mode, you need to plan node resources in each AZ in advance to ensure that each AZ has sufficient node resources for deploying pods. (If there are more than 1 add-on pods in a single AZ, the nodes to which the add-on pods can be scheduled in each AZ should be one more than the actual add-on pods in the current AZ.) This ensures successful deployment of add-on pods although node resources in some AZ are insufficient and smooth scheduling of add-on pods during update. |
Scenarios have high requirements for DR |
Configuring Health Check for a Container
Kubernetes automatically restarts pods that are not running properly. This prevents service interruption caused by exceptions of pods. In some cases, however, even if a pod is running, it does not mean that it can provide services properly. For example, a deadlock may occur in a process in a running pod, but Kubernetes does not automatically restart the pod because it is still running. To solve this problem, configure a liveness probe to check whether the pod is healthy. If the liveness probe detects a problem, Kubernetes will restart the pod.
You can also configure a readiness probe to check whether the pod can provide normal services. After an application container is started, it may take some time for initialization. During this process, the pod on which this container is running cannot provide services to external systems. The Services forward requests to this pod only when the readiness probe detects that the pod is ready. When a pod is faulty, the readiness probe can prevent new traffic from being forwarded to the pod.
The startup probe is used to check whether the application container is started. The startup probe ensures that the containers can start successfully before the liveness probe and readiness probe do their tasks. This ensures that the liveness probe and readiness probe do not affect the startup of containers. Configuring the startup probe ensures that the slow-start containers can be detected by the liveness probe to prevent Kubernetes from terminating them before they are started.
You can configure the preceding probes when creating an application. The following is an example:
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-http spec: containers: - name: liveness image: nginx:alpine args: - /server livenessProbe: httpGet: path: /healthz port: 80 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 3 periodSeconds: 3 readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 startupProbe: httpGet: path: /healthz port: 80 failureThreshold: 30 periodSeconds: 10
Configuring Auto Scaling
Auto scaling can automatically adjust the number of application containers and nodes as required. Containers and nodes can be quickly scaled out or scaled in to save resources and costs.
Typically, two types of auto scaling may occur during peak hours:
- Workload scaling: When pods or containers are used for deploying applications, the requested and limit values of the containers are generally configured to prevent unlimited usage of resources during peak hours. However, after the upper limit is reached, an application error may occur. To resolve this issue, scale in the number of pods to share workloads.
- Node scaling: After the number of pods grows, the resource usage of the node may increase to a certain extent. This results in that the added pods cannot be scheduled. To solve this problem, scale in or out nodes based on the resource usage.
For details, see Using HPA and CA for Auto Scaling of Workloads and Nodes.
Viewing Logs, Monitoring Metrics, and Adding Alarm Rules
- Logging
- Application logs are generated by pods. These logs include logs generated by pods in which the service containers are running and Kubernetes system components like CoreDNS. CCE allows you to configure policies for collecting, managing, and analyzing logs periodically to prevent logs from being over-sized.
- Monitoring
- Metrics of the master nodes: Monitoring these metrics enables you to efficiently identify problems occurred on the master nodes. For details, see Monitoring Metrics of the Master Nodes.
- Metrics of the applications: CCE can comprehensively monitor applications in clusters by checking these metrics. In addition to standard metrics, you can configure custom metrics of your applications that comply with their specifications to improve the observability.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot