Cluster and Index Planning
In Cloud Search Service (CSS), you can select the cluster version, architecture, storage type, number of cluster nodes, storage capacity, and number of index shards. Configure them based on your service requirements for read and write requests, data storage and computing, and search and analytics.
You can configure the following specifications of a CSS cluster:
- Cluster Version
- Cluster Architecture
- Storage Types
- Cluster Nodes
- Node Storage Capacity
- Number of Index Shards
Cluster Version
You are advised to select the Elasticsearch version used with CSS as follows:
- If you are using the CSS Elasticsearch cluster for the first time, select 7.10.2 or 7.6.2.
- If you want to migrate an existing Elasticsearch cluster to CSS, you are advised to create an Elasticsearch cluster by choosing version 7.10.2 or 7.6.2 if you need to modify the code of the migrated cluster. If you want to keep the current version, create an Elasticsearch cluster that has the same or close version as the existing cluster.
Cluster Architecture
CSS supports multiple architectures, such as read/write splitting, cold and hot isolation, decoupled storage and computing, role separation, and cross-AZ deployment.
Architecture |
Scenario |
Benefits |
---|---|---|
Read/Write splitting |
Production services involving many read operations and only a few write operations. After data is written, it does not need to be accessed within 10s. |
High concurrency, low latency |
Cold and hot data separation |
Log services that have low requirements on cold data query performance. |
Low costs |
Decoupled storage and compute |
Log services that have low requirements on cold data query performance (10s+) and do not require cold data update. This architecture can be used together with the cold and hot data separation architecture to build three levels of storage: hot, warm, and cold. |
Low costs |
Separation of roles |
A cluster that is large, has a large number of indexes, or is highly scalable. |
High availability |
Cross-AZ deployment |
Production services that have high requirements on availability or use local disks. |
High availability |
Storage Types
CSS supports cloud and local disks.
- Cloud disk types include computing-intensive (CPU:memory = 1:2), general computing (CPU:memory = 1:4), and memory-optimized (CPU:memory = 1:8).
- Local disk types include disk-intensive (with HDDs attached) and ultra-high I/O (with SSDs attached).
Storage Type |
Disk Type |
Scenario |
---|---|---|
Computing-intensive |
Cloud drive |
Recommended scenario: search from a small amount of data (less than 100 GB on a single node) |
General computing |
Cloud drive |
Common scenario: search and analysis when the data volume on a single node is in the range 100 GB to 1,000 GB, for example, medium-scale e-commerce search, social search, and log search |
Memory-optimized |
Cloud drive |
Common scenario: search and analysis when the data volume of a single node is in the range 100 GB to 2,000 GB Vector search: Large memory helps improve cluster performance and stability. |
Disk-intensive |
Local disk |
Logs: Cold data needs to be stored and updated, and the requirements on cold data query performance is low. |
Ultra-high I/O - Kunpeng |
Local disk |
Large-scale logs: hot data storage |
Ultra-high I/O - x86 |
Local disk |
Large-scale search and analysis: High computing or disk I/O performance is required, such as public opinion analysis, patent search, and database acceleration. |
Cluster Nodes
After the architecture and storage type of a CSS cluster are selected, determine the number of nodes in the cluster based on your performance requirements.
Type |
Performance Baseline |
Node Quantity Calculation Method |
Example |
---|---|---|---|
Write node |
|
Number of write nodes = Peak traffic/Number of vCPUs on a single node/Write throughput of a single vCPU x Number of copies |
If the peak write rate is 100 MB/s and a node has 16 vCPUs and 64 GB memory, 12 nodes (100/16/1 x 2) are required. |
Query node |
The performance of the same node varies greatly in different scenarios. It is difficult to evaluate the performance baseline of a single node. The average query response time is used as the query performance baseline for calculation. |
Number of query nodes = QPS/(Number of vCPUs on a single node x 3/2/Average query response time per second) x Number of shards |
If the query QPS is 1000, the average query response time is 100 ms, three index shards are planned, and a node has 16 vCPUs and 64 GB memory, about 12 nodes (1000/(16 x 3/2/0.1) x 3) are required. |
Number of nodes |
/ |
Number of nodes = Number of write nodes + Number of query nodes |
Number of nodes = Number of write nodes + Number of query nodes = 24 |
If two clusters can achieve the same performance, you are advised to select the one using higher specifications and fewer nodes. For example, a cluster using 3 nodes with 32 vCPUs and 64 GB memory achieves the same performance as the one using 12 nodes with 8 vCPUs and 16 GB memory, but the former runs more stable and can be more easily scaled. For a high-specification cluster that reaches the performance bottleneck, you simply need to scale it out (by adding nodes); whereas for a low-specification cluster, you need to scale it up (by changing to higher specifications).
Node Storage Capacity
The disk space of each node in a CSS cluster is determined by multiple factors, such as the data volume, number of copies (often set to 1), data bloat rate, and disk space usage (often set to 70%). You can use the following formula to calculate the storage capacity of a cluster:
Storage capacity = Source data x (1 + Number of copies) x 1.25 x (1 + Reserved space) ≈ Source data x 2 x 1.25 x 1.3 = Source data x 3.25
Number of Index Shards
You are advised to plan the number of index shards in a CSS cluster based on the following principles:
- The size of a single shard is in the range 10 GB to 50 GB.
- A cluster has fewer than 30,000 shards.
- It is recommended that 1 GB memory be used for 20 to 30 shards, and that a single node have no more than 1,000 shards.
- For a single index, it is recommended that the number of index shards be the same as or a multiple of the number of nodes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot