Enhancing Data Ingestion Performance
In large-scale data ingestion scenarios, such as log analytics and real-time monitoring, the traditional Elasticsearch routing mechanism can generate excessive internal forwarding requests as the number of index shards and the write throughput increase. This can lead to write rejections and increased long-tail latency. To address this issue, CSS provides several ways to enhance ingestion performance: bulk routing, bulk aggregation, text indexing acceleration, and merge optimization. They can help reduce performance overhead and significantly improve write stability and throughput in heavy-load scenarios.
How the Feature Works
To balance performance and functionality, CSS provides four ingestion performance enhancement solutions. The core idea is to eliminate or mitigate performance bottlenecks by reducing cross-node communication and accelerating index building.
| Solution | Description | When to Use | Details |
|---|---|---|---|
| Bulk routing | Bulk routing reduces cross-node communication and request forwarding by the coordinator node, helping to prevent write rejections. How it works: Rather than routing each document individually based on its ID (the default behavior), this optimization forces multiple documents within the same bulk request to be routed together to the same shard or local node. | Enable this feature for large-scale clusters with a high number of shards. It is particularly effective when bulk request throughput is so intense that it exhausts the coordinator node's network I/O or CPU resources. | |
| Bulk aggregation | Bulk aggregation reduces system overhead and improves single-node capacity by minimizing resource lock contention during write operations. How it works: Instead of processing documents individually within the same bulk request, the system processes them as a unified batch, reducing memory allocations and lock contention. | Enable this feature for ultra-high concurrency write scenarios where throughput is critical. We recommend enabling it when monitoring indicates high system CPU usage or significant memory allocation pressure. | |
| Text indexing acceleration | Text indexing acceleration significantly speeds up the indexing of text and keyword fields. How it works: Through an optimized indexing process, improved memory utilization, and accelerated tokenization, this feature improves the overall efficiency of text processing. | Enable this feature when you need high-speed index building with relatively simple tokenization logic. Examples of such scenarios include log auditing and full-text search, where large-volume text and numerous keyword fields are involved. | |
| Merge optimization | Merge optimization ensures write stability by preventing write throttling caused by slow segment merging. How it works: By dynamically increasing merging threads for index shards, this feature accelerates the merging of small segments and reclaims write cache space in a timely manner. | Enable this feature in high-frequency write scenarios, particularly when the write throughput increases drastically after enabling the three optimizations above and writes become unstable due to merge throttling. |
Constraints
Only Elasticsearch 7.10.2 clusters support data ingestion performance enhancement.
Logging In to Kibana
Log in to Kibana and go to the command execution page. Elasticsearch clusters support multiple access methods. This topic uses Kibana as an example to describe the operation procedures.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the cluster list, find the target cluster, and click Kibana in the Operation column to log in to the Kibana console.
- In the left navigation pane, choose Dev Tools.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
Configuring Bulk Routing
After bulk routing is enabled (that is, index.bulk_routing is set to pack or local_pack), documents are no longer routed by ID. This may affect routing-related functionality. For example, ID-based GET requests may fail because the shards cannot be located.
Enable bulk routing to alleviate request forwarding pressure for the node that receives requests.
PUT /{index_name}/_settings
{
"index.bulk_routing": "local_pack"
} | Parameter | Type | Default Value | Description |
|---|---|---|---|
| index_name | String | N/A | Specifies one or more indexes.
|
| index.bulk_routing | String | default | Specifies the internal routing policy for processing documents within bulk write requests.
|
Configuring Bulk Aggregation
Enable bulk aggregation to enable the system to process documents within each bulk request as a unified batch, instead of processing each document individually. This significantly reduces CPU overhead and lock contention during ultra-high concurrency write operations.
PUT /{index_name}/_settings
{
"index.aggr_perf_batch_size": "128"
} | Parameter | Type | Default Value | Description |
|---|---|---|---|
| index.aggr_perf_batch_size | Integer | 1 | Specifies the maximum number of documents to process as a single batch. Minimum value: 1 (disables bulk aggregation so that documents will be processed individually) Recommended value: 128 Any value greater than 1 enables bulk aggregation. The system will internally reaggregate documents from bulk requests into batches up to this size. Larger batch sizes increase memory consumption per batch. Adjust this parameter based on your workload requirements and available system resources. The actual number of documents per batch is calculated as: MIN(Number of documents within the bulk request, aggr_perf_batch_size) |
Configuring Text Indexing Acceleration
Text indexing acceleration cannot be enabled for indexes whose mapping contains nested fields. The system will not be able to start acceleration due to compatibility issues.
Enable text indexing acceleration to speed up the indexing of text and keyword fields.
PUT /{index_name}/_settings
{
"index.native_speed_up": true,
"index.native_analyzer": true
} | Parameter | Type | Default Value | Description |
|---|---|---|---|
| index.native_speed_up | Boolean | false | Whether to enable text indexing acceleration. The value can be:
|
| index.native_analyzer | Boolean | false | Whether to enable tokenization acceleration. This parameter is available only when index.native_speed_up is set to true, and it applies to text fields only. The value can be:
|
Configuring Merge Optimization
Increase the number of threads used for segment merging for index shards. This helps prevent write throttling caused by slow segment merging.
PUT /{index_name}/_settings
{
"index.merge.scheduler.max_thread_count": 8
} | Parameter | Type | Default Value | Description |
|---|---|---|---|
| index.merge.scheduler.max_thread_count | Integer | 4 | Specifies the maximum number of threads used for segment merging in a single shard. Value range: 1 to (Number of CPU cores on the node/2) The recommended value is 8. It allows you to accelerate segment merging by leveraging the multi-core architecture of CPUs, eliminating write throttling caused by slow segment merging. However, you must ensure that your cluster nodes have sufficient CPU cores to run these threads. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot