Elasticsearch Cluster Planning Suggestions
Before creating an Elasticsearch cluster, develop a plan for it, such as whether to deploy the cluster across multiple AZs to improve availability; the node quantity and specifications; the cluster version and security mode; and index sharding, in order to ensure the desired performance and reliability.
Planning Cluster AZs
By deploying a CSS cluster across multiple AZs, you can increase the cluster's availability, lower the likelihood of data loss, and minimize service downtime. You can select two or three different AZs in the same region to deploy a cluster.
- When creating a multi-AZ cluster, ensure that the number of selected nodes of any type is no less than the number of AZs. Otherwise, multi-AZ cluster deployment will fail.
- When a multi-AZ cluster is deployed, nodes of all types are evenly distributed across different AZs. The difference between node quantities in different AZs does not exceed 1.
- If the number of data nodes plus cold data nodes in a cluster is not an integer multiple of the number of AZs, data in the cluster may be unevenly distributed, affecting data query or write performance.
- In a two-AZ deployment, if one AZ becomes unavailable, the other AZ continues to provide services. In this case, at least one replica is required. Elasticsearch uses one replica by default. You can retain the default value if you do not require higher read performance.
- In the case of a three-AZ deployment, if one AZ becomes unavailable, the other AZs can continue to provide services. In this case, at least one replica is required. Elasticsearch uses one replica by default. If you need more replicas to improve the cluster's ability to handle queries, modify the replica setting to set more replicas.
For example, you can run the following command to set the number of index replicas:
curl -XPUT http://ip:9200/{index_name}/_settings -d '{"number_of_replicas":2}'
Alternatively, run the following command to specify the number of replicas in the index template:
curl -XPUT http://ip:9200/ _template/templatename -d '{ "template": "*", "settings": {"number_of_replicas": 2}}'
where, ip indicates the private IP address of the cluster, index_name indicates the index name, and number_of_replicas indicates the number of index replicas to change to. In this example, the number of index replicas is changed to 2.
You can switch AZs for an existing cluster. For details, see Switching AZs for an Elasticsearch Cluster.
- Add AZ: Add one or two AZs to a single-AZ cluster, or add an AZ to a dual-AZ cluster to improve cluster availability.
- Migrate AZ: Completely migrate data from the current AZ to another AZ that has sufficient resources.
Planning the Cluster Version
When selecting an Elasticsearch cluster version, consider factors such as service requirements, available features, performance, security updates, and long-term support, ensuring that the selected version can meet both current and future needs and provide a stable, secure environment for your data.
- If you are deploying the Elasticsearch clusters of CSS for the first time, you are advised to use the latest version.
- If you are migrating an in-house built or third-party Elasticsearch cluster to CSS without changing the cluster, keep the version of the source cluster.
- If you are migrating an in-house built or third-party Elasticsearch cluster to CSS while recoding it, choose Elasticsearch 7.10.2 or 7.6.2.
Feature |
Elasticsearch 7.6.2 |
Elasticsearch 7.10.2 |
Details |
---|---|---|---|
Vector search |
√ |
√ |
|
Storage-compute decoupling |
√ |
√ |
Configuring Storage-Compute Decoupling for an Elasticsearch Cluster |
Flow Control 2.0 |
√ |
√ |
|
Flow Control 1.0 |
√ |
√ |
|
Large query isolation |
√ |
√ |
Configuring Large Query Isolation for an Elasticsearch Cluster |
Enhanced aggregation |
x |
√ |
Configuring Enhanced Aggregation for an Elasticsearch Cluster |
Read/write splitting |
√ |
√ |
Configuring Read/Write Splitting Between Two Elasticsearch Clusters |
Switchover between hot and cold storage |
√ |
√ |
Switching Between Hot and Cold Storage for an Elasticsearch Cluster |
Index recycle bin |
x |
√ |
Configuring an Index Recycle Bin for an Elasticsearch Cluster |
Enhanced import performance |
x |
√ |
Enhancing the Data Import Performance of Elasticsearch Clusters |
Enhanced cluster kernel monitoring |
√ |
√ |
|
Index monitoring |
√ |
√ |
Planning Node Types
For an Elasticsearch cluster, the proper planning of different types of nodes is critical to optimizing performance and resource utilization. Before creating a cluster, determine the types of nodes to use based on service requirements, query load, data growth patterns, and performance goals. Table 4 describes the characteristics of different node types and the purposes they are suited for.
- If no master or client nodes were enabled when a cluster was created, you can add them if data nodes become overloaded later at some point. For details, see Adding Master or Client Nodes.
- If no cold data nodes were enabled during cluster creation, they cannot be added later, so you have to determine whether to use cold data nodes while creating a cluster.
Node Type |
Node Description |
Characteristics |
---|---|---|
Data node (ESS) |
Data nodes are used to store data. In a cluster that has neither master nor client nodes, data nodes provide the functions of both types of nodes. |
Data nodes are mandatory for any cluster.
|
Master node (ess-master) |
The master node is responsible for cluster management, such as metadata management, index creation and deletion, and shard allocation. It plays a critical role in metadata management, node management, stability guarantee, and cluster operation control for large-scale clusters. |
|
Client node (ess-client) |
Client nodes receive and coordinate external requests, such as search and write requests. They play an important role in handling high-load queries, complex aggregations, managing a large number of shards, and improving cluster scalability. |
|
Cold data node (ess-cold) |
Cold data nodes are used to store query latency-insensitive data in large quantities. They offer an effective way to manage large datasets and cut storage costs. |
|
Planning Node Storage
- Planning node models
CSS supports various ECS models suited for different application needs. Select the appropriate models based on service requirements and performance expectations to achieve a perfect balance between storage performance and costs.
Table 5 Different node models and the intended application scenarios Node Model
Disk Type
Specifications Description
Recommended Scenario
Computing-intensive
Cloud drive
vCPUs:Memory = 1:2
Small-volume searches (less than 100 GB on a single node).
General computing
Cloud drive
vCPUs:Memory = 1:4
Medium-scale e-commerce site search, social search, and log search, search and analysis where the data volume on a single node is in the range 100 GB to 1,000 GB.
Memory-optimized
Cloud drive
vCPUs:Memory = 1:8
Search and analysis where the data volume on a single node is in the range 100 GB to 2,000 GB.
This type of node is a good option for vector search, as its large memory helps improve cluster performance and stability.
Disk-intensive
Local disk
Attached HDDs
Cold data storage, such as logs. Such data may need to be updated from time to time, and does not require a high query performance.
Ultra-high I/O
(CPU architecture: Kunpeng)
Local disk
Attached SDDs
Large-scale log storage (hot data).
Ultra-High I/O
(CPU architecture: x86)
Local disk
Attached SDDs
Large-scale search and analysis: High computing or disk I/O performance is required. Other use cases include public opinion analysis, patent search, and database acceleration.
- Planning node specifications
Given the expected data handling capacities, it is always preferable to use a smaller number of nodes with larger specifications rather than a larger number of nodes with smaller specifications. For example, a cluster consisting of three nodes each with 32 CPU cores and 64 GB memory is usually better than a cluster consisting of 12 nodes each with 8 CPU cores and 16 GB memory in terms of stability and scalability.
The specific advantages are as follows:
- Cluster stability: High-specs nodes provide more powerful data processing capabilities and larger memory space, leading to higher overall cluster stability.
- Improved scalability: When a cluster consisting of high-specs nodes encounters a performance bottleneck, you simply add more of these high-specs nodes. This is easier than increasing the specifications of existing nodes.
- Easier maintenance: A smaller number of nodes means easier maintenance and less complex management.
In contrast, when a cluster consisting of low-specs nodes needs extra capacity, usually a vertical scale-up is performed, meaning to increase the specifications of existing nodes. This may entail not only more complex, challenging migration and upgrade processes, but also additional maintenance costs.
To sum up, when planning a cluster, you must fully consider performance, costs, maintenance, and scalability, and choose the node specifications that best suit your needs.
- Planning storage capacity
When planning the storage capacity of a CSS cluster, consider the following factors: the original data size, number of data replicas, data bloat rate, and disk usage The following is a recommended formula for determining the needed cluster storage capacity.
Storage capacity = Original data size x (1 + Number of replicas) x (1 + Data bloat rate) x (1 + Ratio of reserved space)
- Original data size: Determine the size of the original data that needs to be stored.
- Number of replicas: The default value is 1.
- Data bloat rate: Extra data may be generated due to data indexing. Generally, you are advised to use a 25% data bloat rate.
- Disk usage: Considering the space occupied by the operating system and file system and the space reserved for optimized disk performance and redundancy, you are advised to keep the disk usage under 70%. That is, you need to reserve 30% of the total disk capacity.
A recommended formula is as follows: Cluster storage capacity = Original data size x 2 x 1.25 x 1.3
To put it simply, if the original data size is known, the total storage capacity of the cluster needs to be 3.25 times that. This formula is for quick reference only. You still need to adjust it based on the actual applications and projected data growth rate.
Planning the Node Quantity
Plan the node quantity based on performance requirements and predicted load. Table 6 provides a method for calculating the appropriate number of nodes. Following this method helps you ensure cluster performance and stability.
Node |
Performance Baseline |
Formula |
Example |
---|---|---|---|
Write node |
|
Number of write nodes = Peak traffic/Number of vCPUs per node/Write throughput per vCPU x Number of replicas |
If the peak inbound traffic is 100 MB/s and a node has 16 vCPUs and 64 GB memory, 12 nodes (100/16/1 x 2) are needed. |
Query node |
It is difficult to evaluate the performance baseline of a single node out of the context of specific application scenarios. The average query response time (in seconds) is used here to measure the query performance baseline. |
Number of query nodes = QPS/(Number of vCPUs per node x 3/2/Average query response time in seconds) x Number of shards |
If the query QPS is 1000, the average query response time is 100 ms (0.1s), three index shards are planned, and a node has 16 vCPUs and 64 GB memory, ~12 nodes (1000/(16 x 3/2/0.1) x 3) are needed. |
Total number of nodes |
N/A |
Total number of nodes = Number of write nodes + Number of query nodes |
Total number of nodes = Number of write nodes + Number of query nodes = 24 |
NOTE:
Here, the total number of nodes refer to the number of data nodes plus that of cold data nodes. |
In each cluster, the number of nodes supported by each node type varies, depending on the types of nodes used in that cluster. For details, see Table 7.
Node Type |
Node Quantity |
---|---|
ess |
ess: 1-32 |
ess, ess-master |
ess: 1-200 ess-master: an odd number ranging from 3 to 9 |
ess, ess-client |
ess: 1-32 ess-client: 1-32 |
ess, ess-cold |
ess: 1-32 ess-cold: 1-32 |
ess, ess-master, ess-client |
ess: 1-200 ess-master: an odd number ranging from 3 to 9 ess-client: 1-32 |
ess, ess-master, ess-cold |
ess: 1-200 ess-master: an odd number ranging from 3 to 9 ess-cold: 1-32 |
ess, ess-client, ess-cold |
ess: 1-32 ess-client: 1-32 ess-cold: 1-32 |
ess, ess-master, ess-client, ess-cold |
ess: 1-200 ess-master: an odd number ranging from 3 to 9 ess-client: 1-32 ess-cold: 1-32 |
NOTE:
|
Planning VPCs and Subnets
There are two types of VPCs: shared and non-shared.
Compared with a non-shared VPC, a shared VPC has the following advantages:
- You can create resources in a VPC under one account and share the resources with other accounts. This way, the other accounts do not need to resources. Fewer resources and simplified network architecture improves management efficiency and reduces costs.
If there are VPCs under different accounts, VPC peering connections will be needed to connect these VPCs. With VPC sharing, different accounts can create resources within one VPC. This eliminates the need to create VPC peering connections and simplifies the network structure.
- Resources can be centrally managed in one account, which helps enterprises configure security policies in a centralized manner and better monitor and audit resource usage for higher security.
Method |
Description |
Operation Guide |
---|---|---|
Method A: |
|
|
Method B: |
|
Planning a Cluster's Security Mode
Cluster Type |
Description |
Characteristics |
|
---|---|---|---|
Non-security mode cluster |
Cluster for which the security mode is disabled |
With such a cluster, access to the cluster will require no user authentication, and data will be transmitted in plaintext using HTTP. Make sure the customer is in a secure environment, and do not expose the cluster access interface to the public network. |
This type of cluster is mostly used for internal services and testing.
|
Security-mode cluster |
Cluster in security mode + HTTP |
A security-mode cluster requires user authentication. It supports access control and data encryption, and it uses HTTP to transmit data in plaintext. Make sure the customer is in a secure environment, and do not expose the cluster access interface to the public network. |
Access control by user permissions is supported. This type of cluster is suitable for workloads that are particularly performance-demanding.
|
Cluster in security mode + HTTPS |
A security-mode cluster requires user authentication. It supports access control and data encryption, and it uses HTTPS to encrypt communication and enhance data security. |
This type of cluster is suitable where there is a high security standard and public network access is required.
|
To access a security-mode cluster, you need to provide a username and password. CSS supports authentication for the following two types of users:
- Administrator: The default administrator username is admin, and the password is the one specified during cluster creation.
- Cluster user: created by the cluster administrator on Kibana. For details, see Creating Users for an Elasticsearch Cluster and Granting Cluster Access.
You can change the security mode of an existing cluster. For details, see Changing the Security Mode of an Elasticsearch Cluster.
You have many options when it comes to changing the security mode of a cluster: from non-security mode to security mode, from security mode to non-security mode, and switching between security modes using different web protocols (HTTP or HTTPS).
Planning the Number of Index Shards
Before importing data to a cluster, carefully consider your service needs and plan the cluster's data structure and distribution in advance. This includes properly designing indexes and deciding on the appropriate number of index shards. To ensure optimal performance and scalability for a cluster, consider following these best practices:
- The size of a single shard: Keep the size of each shard between 10 GB and 50 GB. This helps strike a balance between storage efficiency and query performance.
- Total number of shards in a cluster: To facilitate management and avoid an excessively large scale, make sure the total number of shards in a cluster is less than 30,000. This helps maintain the stability and responsiveness of the cluster.
- Memory-to-shards ratio: Limit the number of shards per 1 GB of memory to 20 to 30. This ensures that each shard has sufficient memory resources to respond to indexing and query operations.
- Number of shards per node: To prevent node overload, keep the number of shards on each node under 1000. This helps to improve node stability.
- Relationship between the number of index shards and the number of nodes: For each index, make sure the number of shards is the same as or is an integral multiple of the number of nodes in the cluster. This helps improve load balancing and optimize query and indexing performance.
Following these suggestions, you can plan and manage index shards for a CSS cluster more effectively, improving the cluster's overall performance and maintainability.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot