Using Open-Source Logstash to Export Data in Batches from a CSS Elasticsearch Cluster

Use open-source Logstash to efficiently export data from a CSS Elasticsearch cluster for purposes like backup, migration, and analysis.

Scenarios

You may use this solution for the following purposes:

Data backup: Regularly perform full backups on all indexes to ensure data security.
Data migration: Migrate data to other storage systems (such as an object storage service or database).
Data cleaning: Export data and preprocess it, including filtering by fields and converting formats.
Data analysis: Export data to JSON files for offline analysis.

To export only a small amount of data (less than 10 MB), use Kibana. For details, see Can I Export Data from Kibana in CSS?.

Solution Architecture

Figure 1 Using open-source Logstash to export data from Elasticsearch
Click to enlarge

This solution consists of three key components:

Input (reading data): The logstash-input-elasticsearch plugin connects to a CSS cluster to read data from specified indexes.
Filter (filtering data): This one is optional. It is used for data cleaning (such as filtering data by field and converting formats).
Output (outputting data in file): The logstash-output-file plugin writes data to a local JSON file.

Advantages

Efficient batch processing: This solution supports both full data exporting and incremental data exporting based on filters.
Flexible configuration: You can customize filters using Elasticsearch query DSL.
Ecosystem compatibility: This solution fits perfectly into the Elasticsearch ecosystem. There are no additional development costs.
Low cost: This solution uses an open-source tool.

Constraints

The data exporting speed is affected by the load of the target CSS cluster and the performance (CPU and memory) of the ECS where the data is exported to.
Make sure the CSS Elasticsearch cluster and Logstash versions are compatible. This example uses Logstash 7.10.2, in which case, Elasticsearch 7.10.2 is recommended.

Prerequisites

You have obtained the address, username, and password for accessing the CSS Elasticsearch cluster.
The ECS where you plan to export data to has been deployed and can communicate with the CSS Elasticsearch cluster (for example, they are in the same VPC and security group).

Step 1: Deploying Logstash on the Client

Deploy open-source Logstash on the ECS.

Log in to the ECS.
```
ssh username@<ECS_IP>
```
Download and install open-source Logstash.
```
cd ~
wget https://artifacts.elastic.co/downloads/logstash/logstash-7.10.2-linux-x86_64.tar.gz
tar -zxvf logstash-7.10.2-linux-x86_64.tar.gz
```
If the ECS cannot access the Internet, download the Logstash installation package and upload it to the ECS beforehand. Download Logstash at https://artifacts.elastic.co/downloads/logstash/logstash-7.10.2-linux-x86_64.tar.gz.
Modify the Logstash memory configuration.
Change the JVM heap size for Logstash. The default value is 1 GB. We recommend that you set it to half of the ECS memory.
```
cd logstash-7.10.2
vi config/jvm.options
```
Figure 2 shows an example.

Figure 2 Changing the JVM heap size for Logstash
Optimize Logstash's batch processing performance.
Modify the pipelines.yml file and change the value of pipeline.batch.size to 5000.
```
vi config/pipelines.yml
```
Figure 3 shows an example.

Figure 3 Modifying pipelines.yml
Test connectivity.
Run a curl command to test the connectivity between the ECS and the CSS Elasticsearch cluster.
```
curl -ik http://<CSS_IP>:9200    #No user authentication
curl -ik https://<CSS_IP>:9200 -u <Username>:<password>   # HTTPS authentication
```
If the cluster information is returned, the two are connected.

Step 2: Exporting Elasticsearch Data

Create a Logstash configuration file.
Create a configuration file, for example, es2file_all.conf, in the config directory under the Logstash installation path.
```
vi config/es2file_all.conf
```

Modify the Logstash configuration file and save the change.

Example:

input {
    elasticsearch {
        # Source cluster node addresses. No need to include the protocol.
        hosts => ["xxx.xxx.xxx.xxx:9200", "xxx.xxx.xxx.xxx:9200"]
        # Name of the source index to be migrated
        index => "xxx,xxx"
        query => '{"query":{"range":{"@timestamp":{"gte":"now-5m","lte":"now/m"}}}}'
        docinfo => true
        slices => 5
        size => 5000
        # Mandatory fields for a security-mode cluster. (Delete them for a cluster with the security mode disabled.)
        # user => "xxx"           # Username for accessing the cluster.
        # password => "xxx"       # Password corresponding to the username.
        # If SSL is enabled for the source cluster, additionally configure the following information:
        # ssl => true
        # ca_file => "/opt/logstash/xxx/certs"     # Path of the CA certificate used to verify the source cluster.
    }
}

filter {
    # Removes some metadata fields automatically added by Logstash
    mutate {
        remove_field => ["@version"]
  }
}

output {
    file {
        path => "./test_inc-%{+YYYY-MM-dd}.json"
  }
}

**Table 1** Description of key configuration items
Configuration Item	Mandatory (Yes/No)	Description
input	Yes	The input plugin reads data from the Elasticsearch cluster.
hosts	Yes	Source cluster node addresses. You can configure multiple IP addresses. Value format: ["<Node IP address 1>:<Port number>", "<Node IP address 2>:<Port number>"]
user	No	Username for accessing the source cluster. Mandatory for a security-mode cluster.
password	No	Password for accessing the source cluster. Mandatory for a security-mode cluster.
index	Yes	Name of the source index to be migrated. Single index: Enter the index name, for example, my_index. Multiple indexes: Enter multiple index names and use a comma (,) to separate them, for example, my_index1,my_index2. Wildcard: Use the wildcard () to match multiple indexes. For example, myindex indicates all indexes whose name starts with myindex.
query	No	Use Elasticsearch query DSL to set filters. To export all data, there is no need to set this parameter. To export only part of the data, use this parameter to set filters. When setting this parameter, make sure the fields used in the filters already exist in the target indexes. Otherwise, data cannot be matched. In the example above, data of the last 5 minutes is fetched. now-5m indicates the last 5 minutes prior to the current time, and now/m indicates the current minute rounded down.
docinfo	No	Whether to include document metadata (such as index names and IDs). true: to include document metadata. Such data can be useful for subsequent association analysis. false: not to include document metadata. If you only need the document content, set this parameter to false to reduce the output size.
slices	No	Number of parallel shards. The read task will be split into multiple parallel shards to improve throughput. Default value: 1 Set this parameter based on the cluster load. Too many parallel shards will increase the cluster load. Given sufficient cluster resources, to improve performance, increase this value accordingly. The recommended value is 2 to 8.
size	No	Number of documents read from Elasticsearch for each request. A larger value indicates higher efficiency, but at the cost of higher memory usage. The maximum value is constrained by the Elasticsearch cluster. You are advised to set the value (for example, between 1000 and 10000) based on the ECS performance and Elasticsearch cluster load.
ssl	No	Whether SSL is enabled for the source cluster. The value can be: true: Uses HTTPS to transmit data. false: Uses HTTP to transmit data.
ca_file	No	Path of the CA certificate used to verify the source cluster. Value format: <certificate path><certificate name>, for example, /opt/logstash/extend/certs. Obtain the destination cluster's security certificate and upload it to the Logstash installation path. In this case, enter the absolute path of the certificate.
filter	No	The filter plugin used for data filtering. In this example, the filter plugin only deletes fields automatically added by Logstash to avoid exporting too much redundant data.
output	Yes	The output plugin can be used to specify the output file path and file naming rule. You are advised to specify an absolute path to ensure file accessibility.

Execute the Logstash configuration file to start the export task.
```
cd logstash-7.10.2
bin/logstash -f config/es2file_all.conf
```
Verify the export result.
Go to the output file path, and open the JSON file (for example, test_all-2025-06-05.json). Check that the exported data meets your expectation.

Parent Topic: Logstash Data Processing and Ingestion

Previous topic: Logstash Data Processing and Ingestion

Next topic: Synchronizing Data from RDS for MySQL to Elasticsearch Through Logstash

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot