Updated on 2024-11-29 GMT+08:00

Configuration Optimization

Setting the Number of Indexes to Dual Copies

During the migration, to ensure import performance, you can set the number of copies to 0 before creating a table, indicating that only one copy is available. After data import is complete, set the number of copies to 1 to ensure that data reliability fragment copies are automatically replicated from the primary segments. If the number of copies is large, it may take a long time to copy data. In this case, you can adjust the resource control parameters and use the node resources for replication.

Run the curl command on any Elasticsearch node. For details about how to use the curl command, see Running curl Commands in Linux.

  • Normal mode:
    curl -XPUT "http://EsNode IP address:EsNode port number/_cluster/settings" -H 'Content-Type: application/json' -d'
    { 
    "transient": { 
        "cluster.routing": { 
          "allocation.cluster_concurrent_rebalance": 36, 
          "allocation.node_concurrent_recoveries": 3 
    } 
    }, 
      "indices.recovery.max_bytes_per_sec": "1GB" 
    }'
  • Security mode:
    curl -XPUT --tlsv1.2 --negotiate -k -u : "https://EsNode IP address:EsNode port number/_cluster/settings" -H 'Content-Type: application/json' -d' 
    { 
    "transient": { 
        "cluster.routing": { 
          "allocation.cluster_concurrent_rebalance": 36, 
          "allocation.node_concurrent_recoveries": 3 
    } 
    }, 
      "indices.recovery.max_bytes_per_sec": "1GB" 
    }'

Increasing the Index Update Time

By default, a new segments file is generated every second to increase the index update time. A larger segments file can be generated to reduce the I/Os and segments merge pressure. This configuration item can be specified during index creation.

If only data is imported, you do not need to perform real-time query. You can disable the refresh function (that is, set the parameter value to -1) and set the number of copies to 0. After the data is imported, set the parameter to a proper value.

Run the curl command on any Elasticsearch node.

  • Normal mode:
    curl -XPUT "http://EsNode IP address:EsNode port number/myindex/_settings" -H 'Content-Type: application/json' -d'
    {"number_of_replicas": 0,
     "refresh_interval": "180s"}'
  • If the security mode is used, run the following commands:
    curl -XPUT --tlsv1.2 --negotiate -k -u : "https://EsNode IP address:EsNode port number/myindex/_settings" -H 'Content-Type: application/json' -d' 
    {"number_of_replicas": 0,
     "refresh_interval": "180s"}'

Modifying Transaction Log Parameter translog

By default, the persistence policy of the translog is that each request is stored in the memory to ensure the reliability of write operations, which has a significant impact on the performance. For example, the disks are fully occupied by I/Os.

In real-world migration, data is imported to the cluster in batches. Data may be lost due to accidents. In this case, only data needs to be supplemented. The translog persistency policy is adjusted to flush data periodically or when the data volume reaches the specified value, improving import performance. This parameter can be specified when an index is created.

Run the curl command on any Elasticsearch node.

  • Normal mode:
    curl -XPUT "http://EsNode IP address:EsNode port number/_all/_settings" -H 'Content-Type: application/json' -d'
    {
      "index": {
        "translog": {
          "flush_threshold_size": "1GB",
          "sync_interval": "120s",
          "durability": "async"
        }
      }
    }'
  • Security mode:
    curl -XPUT --tlsv1.2 --negotiate -k -u : "https://EsNode IP address:EsNode port number/_all/_settings" -H 'Content-Type: application/json' -d'
    {
      "index": {
        "translog": {
          "flush_threshold_size": "1GB",
          "sync_interval": "120s",
          "durability": "async"
        }
      }
    }'

Optimizing the Elasticsearch Restart Configuration

If a large amount of data has been imported to the Elasticsearch cluster, the restart process may cause a large amount of data replication between the active and standby copies, which consumes a large amount of resources. It takes a long time to restore the cluster. Therefore, if the cluster has a restart requirement, you are advised to adjust related parameters and restore the cluster after the restart.

  1. Run the curl command on any Elasticsearch node to disable shard allocation.
    • Normal mode:
      curl -XPUT "http://EsNode IP address:EsNode port number/_cluster/settings" -H 'Content-Type: application/json' -d'
      {
        "transient": {
      "cluster": {
      "routing": {
      "allocation.enable": "none"
       }
      }
      }
      }'
    • Security mode:
      curl -XPUT --tlsv1.2 --negotiate -k -u:"https://EsNode IP address:EsNode port number/_cluster/settings" -H 'Content-Type: application/json' -d'
      {
        "transient": {
      "cluster": {
      "routing": {
      "allocation.enable": "none"
       }
      }
      }
      }'
  2. Manually trigger the flush operation.
    • Normal mode:

      curl -XPOST "http://EsNode IP address:EsNode port number/_flush/synced"

    • Security mode:

      curl -XPOST --tlsv1.2 --negotiate -k -u : "http://EsNode IP address:EsNode port number/_flush/synced"

  3. Restart the Elasticsearch service or role instance.
  4. Enable shard allocation.
    • Normal mode:
      curl -XPUT "http://EsNode IP address:EsNode port number/_cluster/settings" -H 'Content-Type: application/json' -d'
      {
      "transient": {
      "cluster": {
      "routing": {
      "allocation.enable": "all"
            }
       }
      }
      }'
    • Security mode:
      curl -XPUT --tlsv1.2 --negotiate -k -u:"https://EsNode IP address:EsNode port number/_cluster/settings" -H 'Content-Type: application/json' -d'
      {
      "transient": {
      "cluster": {
      "routing": {
      "allocation.enable": "all"
            }
       }
      }
      }'
  5. Run the following command to check the health status of the cluster. If the status changes to green, the automatic restoration is complete.
    • If the normal mode is used, run the following command:

      curl -XGET "http://EsNode IP address:EsNode port number/_cluster/health?pretty"

    • If the security mode is used, run the following command:

      curl -XGET --tlsv1.2 --negotiate -k -u:"https://EsNode IP address:EsNode port number/_cluster/health?pretty"