Updated on 2023-03-29 GMT+08:00

Migrating Cluster Data Using Logstash

Logstash is an official data migration tool provided by Elasticsearch.

  1. Apply for an ECS, preferably with at least 8 vCPUs and 16 GB memory.
  2. Install Logstash on the ECS.

    1. Install JDK, because Logstash depends on Java. Run the following command to install JDK using yum:
      yum install java
      yum install python
    2. Download Logstash. Choose a Logstash version close to the Elasticsearch version. They do not have to use exactly the same version.

      Logstash 7.10.2 OSS is recommended. You can download it from https://www.elastic.co/downloads/past-releases/logstash-oss-7-10-2

    3. Run the following command to install Logstash using yum:
      yum install logstash-oss-7.10.0-x86_64.rpm

      Replace logstash-oss-7.10.0-x86_64.rpm with the actual Logstash installation package name.

  3. Modify the JVM configuration of Logstash to improve the cluster data migration efficiency.

    Run the following command to modify the JVM configuration. The default heap memory of Logstash is 1 GB. You are advised to change the heap memory to half of the cluster node memory.

    vim /etc/logstash/jvm.options
    -Xms4g
    -Xmx4g

  4. Modify the conf configuration file of Logstash and configure cluster migration settings.

    1. Go to the /etc/logstash/conf.d/ directory where the Logstash configuration file is stored.
      cd /etc/logstash/conf.d/
    2. Create the logstash-es-es-all.conf file.
      vim logstash-es-es-all.conf
    3. Add the following content to the logstash-es-es-all.conf file and save the file.

      Modify the hosts, user, password, index fields as needed.

      input{
          elasticsearch{
              #IP address of the source cluster
              hosts =>  ["http://172.16.xxx.xxx:9200", "http://172.16.xxx.xxx:9200"]
              # #For a security cluster, configure the username and password for cluster login. For a non-security cluster, you can use the number sign (#) to comment out the user and password fields.
              # user => "xxxx"
              # password => "xxxx"
              # #List of indexes to be migrated. Multiple indexes are separated by commas (,). Set this parameter based on the actual host information. -.* indicates that indexes starting with a period (.) are excluded.
              index => "abmau_edi*,business_test,goods_deploy*, -.*"
              # Retain the default values of the following three items, including the number of threads, the size of migrated data, and Logstash JVM configurations.
              docinfo=>true
              # Retain the default value. To increase the migration speed, you can increase the values of the following two parameters, but to a proper extent.
              slices => 3
              size => 3000
          }
      }
      
      filter {
        # Delete some fields added by Logstash.
        mutate {
          remove_field => ["@timestamp", "@version"]
        }
      }
      
      output{
          elasticsearch{
              # Destination cluster address.
              hosts => ["http://10.100.xx.xx:9200", "http://10.100.xx.xx:9200"]
              # Username and password for logging in to the target cluster. If you do not need to configure them, use the number sign (#) to comment them out.
              user => "admin"
              password => "*****"        
              # Index name of the target cluster. The following configurations must be the same as that of the source cluster.
              index => "%{[@metadata][_index]}"
              # #Index type of the target cluster. The following configurations must be the same as that of the source cluster.
              document_type => "%{[@metadata][_type]}"
              # _id of the target data. If the original _id does not need to be retained, you can delete it. After the deletion, the cluster performance can be better.
              document_id => "%{[@metadata][_id]}"
              ilm_enabled => false
              manage_template => false
          }
      
          # Debugging information. You are advised to delete this information before migration.
          # stdout { codec => rubydebug { metadata => true }}
      }

  5. Start Logstash to migrate cluster data.

    1. Run the following command to start Logstash:
      /usr/share/logstash/bin/logstash --path.settings /etc/logstash
    2. View the Logstash log file to check the task progress. The Logstash log directory is /var/log/logstash/.
    3. Wait until the data migration is complete.