Planning Cluster AZs and HA
This topic describes how to improve cluster availability by deploying a cluster across multiple AZs, including the rules for distributing cluster nodes among AZs, the recommended number of replicas, and policies for handling AZ failures.
An availability zone (AZ) is a physical region where resources use independent power supplies and networks. AZs in the same region can communicate with each other through internal networks but are physically isolated.
Multi-AZ deployment is a high availability feature provided by CSS. Deploying a cluster across two or three AZs located in the same region can help prevent data loss and lower the possibility of service outages.
Suggestions on Multi-AZ Deployment
If you select multi-AZ deployment when creating a cluster, CSS automatically enables cross-AZ HA to ensure that cluster nodes will be evenly distributed across the selected AZs (the difference between the number of nodes in each AZ cannot exceed 1).
You are advised to select three AZs, instead of two AZs, for multi-AZ deployment. If only two AZs are selected and one AZ becomes faulty, the cluster may not be able to elect a master node. As a result, the cluster may become unavailable.
Node Distribution Rules
Table 1, Table 2, and Table 3 show the distribution of data nodes, cold data nodes, master nodes, and client nodes across AZs, depending on the node and AZ counts. The node distribution rules for multi-AZ clusters are as follows:
- For a multi-AZ cluster, the system evenly distributes nodes across AZs. The difference in the number of same-type nodes across different AZs will not exceed 1.
- To ensure high service availability and prevent individual node overload in a multi-AZ deployment, the following node count requirements must be met:
- For a two-AZ cluster, there must be at least four data nodes (or cold data nodes), two for each AZ; and there must be at least two client nodes (if configured), one for each AZ.
- For a three-AZ cluster, each available node type must have at least one node in each AZ. This means each node type must have at least three nodes in total.
- For node types that are not deployed (for example, no cold data nodes), the minimum node count requirement does not apply.
- If the number of data nodes or cold data nodes is not an integer multiple of the number of AZs, for example, five data nodes in a two-AZ cluster, uneven data distribution may occur, which can impact query and ingestion performance. For this reason, you should always ensure that the number of nodes is an integer multiple of the number of AZs.
| Total Nodes | Single AZ | Two AZs | Three AZs | |||
|---|---|---|---|---|---|---|
| AZ1 | AZ1 | AZ2 | AZ1 | AZ2 | AZ3 | |
| 1 | 1 | Not supported | Not supported | |||
| 2 | 2 | Not supported | Not supported | |||
| 3 | 3 | Not supported | 1 | 1 | 1 | |
| 4 | 4 | 2 | 2 | 2 | 1 | 1 |
| 5 | 5 | 3 | 2 | 2 | 2 | 1 |
| 6 | 6 | 3 | 3 | 2 | 2 | 2 |
| 7 | 7 | 4 | 3 | 3 | 2 | 2 |
| ... | ... | ... | ... | ... | ... | ... |
Replica Configuration Suggestions
For a multi-AZ cluster, configure the number of index replicas in a manner that can better capitalize on the high availability that comes with such as deployment.
- With a dual-AZ deployment, if one AZ becomes unavailable, the other AZ continues to provide services. In this case, configure at least one replica. If higher query performance is desired, you can increase the number of replicas.
- In the case of a three-AZ deployment, if one AZ becomes unavailable, the other AZs can continue to provide services. In this case, also configure at least one replica. To enhance the cluster's query performance, increase the number of replicas.
For Elasticsearch and OpenSearch clusters, the default number of index replicas is 1. To enable more replicas, modify relevant settings. The following are two examples.
- Adjust the number of replicas for an existing index:
curl -XPUT http://ip:9200/{index_name}/_settings -d '{"number_of_replicas":2}'
- Set the number of replicas for new indexes using a template:
curl -XPUT http://ip:9200/_template/templatename -d '{ "template": "*", "settings": {"number_of_replicas": 2}}'
where, ip indicates the private IP address of the cluster; index_name indicates the index name; templatename indicates the template name; template indicates the index name matching rule (meaning the template will automatically apply to indexes that match this rule. The asterisk (*) indicates that the template will apply to all new indexes); and number_of_replicas indicates the number of index replicas to change to. In this example, the number of index replicas is changed to 2.
Service Outage Pattern Analysis
Master nodes manage cluster-wide operations, including metadata, indexes, and shard allocation. In a cluster with master nodes, the master nodes perform these tasks. In a cluster without them, data nodes and cold data nodes share the responsibilities of the master nodes.
In the case of a single AZ failure, if fewer than half (including half) of the nodes assuming master node responsibilities are still available, service availability will be affected. In this case, you need to restore services by referring to Table 4.
For example, a cluster has three master nodes distributed in two AZs. If the AZ that contains one master node becomes unavailable while the other AZ remains available, services will still be available. However, if it is the other AZ, which contains two of the three master nodes, that fails, services will be interrupted, because fewer than half of the master nodes (or nodes that assume master node responsibilities) are available. In this case, restore services by referring to Table 4.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot