Shard Planning Suggestions
An index can be divided into multiple shards and distributed to different physical machines. The shard division result also affects the indexing and query speed.
Each shard can process data write and query requests. When setting the number of index shards, consider the following rules:
- The query performance deteriorates when the number of data records in a shard increases. It is recommended that a shard contains about 100 million data records, 400 million data records at a maximum.
- The data volume stored in a single shard is limited to about 20 GB, and the maximum data volume does not exceed 30 GB.
- Determine the number of primary shards according to the maximum data capacity of the index and the capacity of a single shard. Generally, a large number of shards are allocated for a large volume of stored data, fully taking the advantage of distributed query. If the data volume of a certain index is quite small (for example, less than 1 GB), a single shard may outperform excessive shards.
- To improve data reliability, set the number of replica shards to 1 at least. If the storage space of the cluster is sufficient, set it to 2.
- The number of shards supported by each node is limited. The node is the object to be allocated with physical resources. As the data in shards increases, the data is continuously loaded to the memory during query. The heap size will be used up, causing frequent garbage collections. As a result, the system cannot work properly. It is recommended that 1 GB memory be used to manage 15 shards. For example, if the maximum memory size of an Elasticsearch instance is 28 GB, set the number of shards managed by a single instance to be less than 500.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot