Using Cluster Federation to Implement Multi-Active DR for Applications

Application Scenarios

To tackle single points of failure (SPOFs), UCS allows instances of an application to run on multiple clouds. When one of the clouds is down, cluster federation will switch over traffic within seconds, significantly improving service reliability.

Figure 1 shows the multi-active DR solution in UCS. Under DNS policies, instances of an application are distributed to three Kubernetes clusters: two Huawei Cloud CCE clusters (deployed in different regions) and one third-party cloud cluster.

Figure 1 Multi-active DR for multiple clusters
Click to enlarge

Prerequisites

You have created a cluster. The following is an example of creating a CCE cluster. For details, see Buying a CCE cluster in two regions (CN South-Guangzhou and CN East-Shanghai1). The Kubernetes version must be 1.19 or later, and each cluster must have at least one available node.

In your production environment, you can deploy clusters in different regions, AZs, or even clouds to implement multi-active DR.
You have created a public zone in Huawei Cloud DNS. For details, see Routing Internet Traffic to a Website.

Setting Up the Basic Environment

Register clusters to UCS and configure cluster access. For details, see Registering a Cluster.

For example, register clusters ccecluster01 and ccecluster02 to the fleet ucs-group of UCS and check whether the clusters are running normally.
Enable cluster federation for the fleet and ensure that the clusters have been connected to a federation. For details, see Cluster Federation.

Figure 2 Clusters
Creating Workloads

To show the traffic switchover effect, the container image versions of the two clusters in this section are different. (This difference does not exist in the actual production environment.)
- Cluster ccecluster01: If the example application uses the image nginx:gz, the message "ccecluster01 is in Guangzhou." will be returned.
- Cluster ccecluster02: If the example application uses the image nginx:sh, the message "ccecluster02 is in Shanghai." will be returned.
Before the operation, upload the images of the example applications to the SWR image repository in the region where the clusters are located. That is, upload the image nginx:gz to CN South-Guangzhou and the image nginx:sh to CN East-Shanghai1. Otherwise, the workloads will malfunction because it cannot pull the images.

In this example, example clusters and workloads are not limited in terms of cloud service providers, regions, and quantity.
1. Log in to the UCS console. In the navigation pane, choose Fleets.
2. Click the name of the fleet for which cluster federation has been enabled. The fleet console is displayed.
3. In the navigation pane, choose Federation > Workloads. In the upper right corner, click Create from Image.
4. Enter the basic information and configure container parameters. The image name can be user-defined. Click Next: Scheduling and Differentiation.
5. Configure the cluster scheduling policy, complete differentiated cluster configuration, and click Create Workload.
  - Scheduling: Select Cluster weight and set the weight of each cluster to 1.
  - Differentiated Settings: Click on the left of the cluster to enable differentiated settings. Set the image name of ccecluster01 to swr.cn-south-1.myhuaweicloud.com/kubernetes-test2/nginx:gz (address of the image nginx:gz in the SWR image repository) and that of ccecluster02 to swr.cn-east-3.myhuaweicloud.com/kubernetes-test2/nginx:sh.
Figure 3 Scheduling and differentiation
Create a LoadBalancer access.
1. Log in to the Huawei Cloud UCS console. In the navigation pane, choose Fleets.
2. Click the name of the fleet for which cluster federation has been enabled. The fleet console is displayed.
3. In the navigation pane, choose Federation > Services and Ingresses. In the upper right corner, click Create Service.
4. Configure the parameters and click OK.
  - Service Type: Select LoadBalancer.
  - Port: Select TCP for Protocol, and enter the service port and container port, for example, 8800 and 80.
  - Cluster: Click to add clusters ccecluster01 and ccecluster02 in sequence. Select a shared load balancer for LoadBalancer. The load balancer must be in the VPC of each cluster. If no load balancer is available in the list, click Create Load Balancer to create one on the ELB console. Retain default values for other parameters.
  - Selector: Services are associated with workloads through selectors. In this example, a workload label is referenced to add a label.
  Figure 4 Creating a Service
Create a DNS policy.
1. Log in to the Huawei Cloud UCS console. In the navigation pane, choose Fleets.
2. Click the name of the fleet for which cluster federation has been enabled. The fleet console is displayed.
3. In the navigation pane, choose Federation > DNS Policies. Then, add a root domain name.
4. In the upper right corner, click Create DNS Policy. Then, configure the parameters.
  - Target Service: Select the Service created in 4.
  - Distribution Mode: Select Adaptive. Traffic will be automatically distributed based on the number of pods in each cluster. In this example, both ccecluster01 and ccecluster02 contain one pod, so each cluster receives 50% of the traffic.
Figure 5 Traffic ratio topology

Verifying Multi-Active DR

You have deployed applications in clusters ccecluster01 and ccecluster02 and allowed external access via LoadBalancer Services. After the DNS policy in 5 is created, the system automatically adds a resolution record for the selected root domain name and generates a unified external access path (domain name address) on UCS. This allows you to access the domain name address to verify traffic distribution.

Obtain the domain name address.
1. Log in to the UCS console. In the navigation pane, choose Fleets.
2. Click the name of the fleet for which cluster federation has been enabled. The fleet console is displayed.
3. In the navigation pane, choose Federation > DNS Policies. The value of Domain Name Address in the list is the domain name address.
Run the following command on a host that has been connected to the public network to continuously access the domain name address and check the cluster application processing status.
- Generally, applications in both clusters receive traffic and each cluster processes 50% of the traffic.
```
while true;do wget -q -O- helloworld.default.mcp-xxx.svc.xxx.co:8800; done
ccecluster01 is in Guangzhou.
ccecluster02 is in Shanghai.
ccecluster01 is in Guangzhou.
ccecluster02 is in Shanghai.
ccecluster01 is in Guangzhou.
ccecluster02 is in Shanghai.
...
```
- When an application exception occurs on ccecluster01 (simulating an application exception by shutting down a cluster node), the system routes all traffic to ccecluster02, so that users are unaware of the exception.
```
while true;do wget -q -O- helloworld.default.mcp-xxx.svc.xxx.co:8800; done
ccecluster02 is in Shanghai.
ccecluster02 is in Shanghai.
ccecluster02 is in Shanghai.
ccecluster02 is in Shanghai.
ccecluster02 is in Shanghai.
ccecluster02 is in Shanghai.
...
```
  Return to the UCS console. You can see that the cluster traffic ratio in the domain name list has changed. ccecluster02 takes over 100% traffic, which is consistent with the configured traffic ratio and what we have observed.