Planning Cluster AZs and HA
This topic describes how to improve cluster availability by deploying a cluster across multiple AZs, including the rules for distributing cluster nodes among AZs, the recommended number of replicas, and policies for handling AZ failures.
An availability zone (AZ) is a physical region where resources use independent power supplies and networks. AZs in the same region can communicate with each other through internal networks but are physically isolated.
Multi-AZ deployment is a high availability feature provided by CSS. Deploying a cluster across two or three AZs located in the same region can help prevent data loss and lower the possibility of service outages.
Suggestions on Multi-AZ Deployment
If you select multi-AZ deployment when creating a cluster, CSS automatically enables cross-AZ HA to ensure that cluster nodes will be evenly distributed across the selected AZs (the difference between the number of nodes in each AZ cannot exceed 1).
You are advised to select three AZs, instead of two AZs, for multi-AZ deployment. If only two AZs are selected and one AZ becomes faulty, the cluster may not be able to elect a master node. As a result, the cluster may become unavailable.
Node Distribution Rules
When a multi-AZ cluster is created, nodes of all types are evenly distributed across different AZs. A maximum of three AZs can be used. Table 1 shows the node distribution when the number of AZs varies.

- When creating a multi-AZ cluster, ensure that the number of selected nodes of any type is greater than or equal to the number of AZs. Otherwise, multi-AZ cluster deployment will fail.
- If the number of data nodes or cold data nodes in a cluster is not divisible by the number of AZs, data in the cluster may be unevenly distributed, affecting data query or write performance.
Replica Configuration Suggestions
For a multi-AZ cluster, configure the number of index replicas in a manner that can better capitalize on the high availability that comes with such as deployment.
- With a dual-AZ deployment, if one AZ becomes unavailable, the other AZ continues to provide services. In this case, configure at least one replica. If higher query performance is desired, you can increase the number of replicas.
- In the case of a three-AZ deployment, if one AZ becomes unavailable, the other AZs can continue to provide services. In this case, also configure at least one replica. To enhance the cluster's query performance, increase the number of replicas.
For Elasticsearch and OpenSearch clusters, the default number of index replicas is 1. To enable more replicas, modify settings. The following are two examples.
- Adjust the number of replicas for an existing index:
curl -XPUT http://ip:9200/{index_name}/_settings -d '{"number_of_replicas":2}'
- Set the number of replicas for new indexes using a template:
curl -XPUT http://ip:9200/_template/templatename -d '{ "template": "*", "settings": {"number_of_replicas": 2}}'
where, ip indicates the private IP address of the cluster; index_name indicates the index name; templatename indicates the template name; template indicates the index name matching rule (meaning the template will automatically apply to indexes that match this rule. The asterisk (*) indicates that the template will apply to all new indexes); and number_of_replicas indicates the number of index replicas to change to. In this example, the number of index replicas is changed to 2.
Service Outage Pattern Analysis
Master nodes manage cluster-wide operations, including metadata, indexes, and shard allocation. In a cluster with master nodes, the master nodes perform these tasks. In a cluster without them, data nodes and cold data nodes share the responsibilities of the master nodes.
In the case of a single AZ failure, if fewer than half (including half) of the nodes assuming master node responsibilities are still available, service availability will be affected. In this case, you need to restore services by referring to Table 2.
For example, a cluster has three master nodes distributed in two AZs. If the AZ that contains one master node becomes unavailable while the other AZ remains available, services will still be available. However, if it is the other AZ, which contains two of the three master nodes, that fails, services will be interrupted, because fewer than half of the master nodes (or nodes that assume master node responsibilities) are available. In this case, restore services by referring to Table 2.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot