Updated on 2024-11-29 GMT+08:00

Modifying the Merge Parameter and the Number of Threads

When data is written to Elasticsearch, a new segment is generated after refreshing. The index segments are merged based on a certain policy. The merge frequency affects the write and query speed. If the merge frequency is high, a large number of I/Os are occupied, affecting the write speed. However, the number of segments is small, which improves the query speed. Therefore, you need to set the merge frequency based on service requirements and ensure that the write and query operations are both fast. Elasticsearch uses TieredMergePolicy by default. You can use parameters to control the frequency of merging index segments.

  1. The index.merge.policy.floor_segment parameter is used to avoid generating small segments. All small segments whose size is smaller than the value of this parameter are merged until the size of the floor is reached. The default value is 2 MB.
  2. The index.merge.policy.max_merge_at_once parameter specifies the maximum number of segments that can be merged at a time. The default value is 10.
  3. The index.merge.policy.max_merged_segment parameter specifies the size of a segment that will not be merged. The default value is 5 GB.
  4. The index.merge.policy.segment_per_tier specifies the number of segments allowed by each tier. The default value is 10. Note that the value must be greater than or equal to the value of index.merge.policy.max_merge_at_once. Otherwise, the value for merging is reached before the maximum number of operations. In this case, merge occurs frequently.
  5. The index.merg.....e.scheduler.max_thread_count parameter specifies the maximum number of threads that can be merged on a shard at the same time. By default, Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2)) threads are started to perform the merge operation. This mode applies to SSDs. However, if the hard disk is a mechanical hard disk, I/O blocking may occur. In this case, set the number of threads to 1.

Generally, you can set the index.merge.policy.max_merge_at_once and index.merge.policy.segment_per_tier parameters to control the merge frequency.

Parameter Modification

Advantage

Disadvantage

Increase the values of index.merge.policy.max_merge_at_once

and index.merge.policy.segment_per_tier (for example: 50).

Improved index speed

Lowered search speed because of reduced segment merge operations and increased segments

Decrease the values of index.merge.policy.max_merge_at_once

and index.merge.policy.segment_per_tier (for example: 5).

Reduced segments and improved search speed

Lowered index speed because of increased segment merge operations and increased consumption of system resources (CPUs, I/Os, and RAM)

The following is an example of the command for modifying the parameters:

curl -XPUT --tlsv1.2 --negotiate -k -u : "https://ip:httpport/myindex-001/_settings?pretty" -H 'Content-Type: application/json' -d'
{
     "merge":{
         "scheduler":{
            "max_thread_count" : "1"
         },
         "policy":{
              "segments_per_tier" : "20",
              "max_merge_at_once": "20",
              "floor_segment" : "2m",
              "max_merged_segment" : "5g"
         }
      }
}'