Help Center/ Cloud Search Service/ Getting Started/ Using Logstash for Data Migration
Updated on 2024-10-12 GMT+08:00

Using Logstash for Data Migration

Logstash can be used to collect, transform, clean, and parse logs. This section offers an example of using a Logstash cluster to migrate data between different Elasticsearch clusters. Through this example, you can learn how to use the Logstash service, including creating clusters, importing and exporting data, and configuring tasks.

Procedure

The following describes how to use a Logstash cluster to migrate data from one Elasticsearch cluster to another.

Before starting to migrate data, make the necessary preparations. For details, see Preparations.

  1. Step 1: Obtaining Elasticsearch Cluster Information: Obtain the addresses of the source and destination Elasticsearch clusters.
  2. Step 2: Creating a Logstash Cluster: Create a Logstash cluster for migrating data between Elasticsearch clusters.
  3. Step 3: Configuring a Data Migration Task: Configure an Elasticsearch cluster migration task for the Logstash cluster.
  4. Step 4: Starting the Migration Task: Start the migration task in the Logstash cluster.
  5. Step 5: Stopping the Task: After data migration is complete, stop the migration task.
  6. Step 6: Deleting the Cluster: Delete clusters that you no longer need to reclaim resources.

Preparations

  • You have registered with Huawei Cloud and performed real-name authentication. Make sure your account is not frozen or in arrears.
    If you do not have a Huawei Cloud account, perform the following operations to create one:
    1. Visit the Huawei Cloud official website.
    2. In the upper right corner of the page, click Register and complete the registration as prompted.
    3. Select the service agreement and click Enable.
    4. Perform real-name authentication.
  • The source Elasticsearch cluster (Source-ES) and destination Elasticsearch cluster (Dest-ES) are ready. Both clusters are single-node non-security mode clusters.

Step 1: Obtaining Elasticsearch Cluster Information

Obtain the addresses of the source and destination Elasticsearch clusters. For security-mode clusters, contact the administrator to obtain their usernames and passwords.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the cluster list, obtain the IP addresses of ElasticSearch clusters from the Private Network Address column. Generally, the IP address format is <host>:<port> or <host>:<port>,<host>:<port>.

    In this example, the address of the source Elasticsearch cluster (Source-ES) is 10.62.179.32:9200, and that of the destination Elasticsearch cluster (Dest-ES) is 10.62.179.33:9200.

    Figure 1 Obtaining IP addresses

Step 2: Creating a Logstash Cluster

Create a Logstash cluster for migrating data between Elasticsearch clusters.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Logstash.
  3. Click Create Cluster in the upper right corner. The Create Cluster page is displayed.
  4. Configure Billing Mode and AZ for the cluster.
    Table 1 Billing mode and AZ parameters

    Parameter

    Description

    Example Value

    Billing Mode

    Select Yearly/Monthly or Pay-per-use.

    • Yearly/monthly: You pay for the cluster by year or month, in advance. The service duration ranges from one month to three years. If you plan to use a cluster for more than nine months, you are advised to purchase a yearly package for a better price. A yearly package costs the same as a 10 monthly package.
    • Pay-per-use: You are billed by actual duration of use, with a billing cycle of one hour. For example, 58 minutes of usage will be rounded up to an hour and billed.

    Pay-per-use

    Region

    Select the region where the cluster is located.

    ECSs in different regions cannot communicate with each other over an intranet. For lower network latency and quicker resource access, select the nearest region.

    Hong Kong, China

    AZ

    Select AZs associated with the cluster region. A maximum of three AZs can be configured.

    AZ 1

  5. Configure basic cluster information.
    Figure 2 Configuring cluster information
    Table 2 Basic configuration parameters

    Parameter

    Description

    Example Value

    Cluster Type

    Select Logstash.

    Logstash

    Version

    Select a cluster version from the drop-down list box.

    7.10.0

    Name

    Cluster name, which contains 4 to 32 characters. Only letters, numbers, hyphens (-), and underscores (_) are allowed and the value must start with a letter.

    Sample-Logstash

  6. Configure the cluster's node specifications
    Figure 3 Configuring the cluster's node specifications
    Table 3 Specification parameters

    Parameter

    Description

    Example Value

    Nodes

    Number of nodes in a cluster, in the range 1 to 100.

    1

    CPU Architecture

    Only x86 is supported.

    x86

    Node Specifications

    Select the specifications of cluster nodes.

    ess.spec-4u8g

    Node Storage Type

    Select the storage type of cluster nodes.

    High I/O

    Node Storage Capacity

    Set the storage capacity of a single cluster node. The default value is 40 GB.

    40GB

  7. Set the enterprise project.

    When creating a CSS cluster, you can bind an enterprise project to the cluster if you have enabled the enterprise project function. In this example, default, the default enterprise project, is selected.

  8. Click Next: Network to configure the cluster network.
    Figure 4 Configuring networking
    Table 4 Network configuration parameters

    Parameter

    Description

    Example Value

    VPC

    Specify a VPC to isolate the cluster's network.

    Select the VPC used by the Elasticsearch clusters.

    NOTE:

    The VPC must contain CIDRs. Otherwise, cluster creation will fail. By default, a VPC will contain CIDRs.

    vpc-default

    Subnet

    A subnet provides dedicated network resources that are isolated from other networks, improving network security.

    subnet-default

    Security Group

    A security group serves as a virtual firewall that provides access control policies for clusters.

    NOTE:

    For enable cluster access, ensure that port 9200 is allowed by the security group.

    default

  9. Click Next: Configure Advanced Settings.

    This cluster is used only for getting started. There is no need to enable advanced settings.

  10. Click Next: Confirm. Check the configuration and click Next to create a cluster.
  11. Click Back to Cluster List to switch to the Clusters page. The cluster you created is now in the cluster list and its status is Creating. If the cluster is successfully created, its status changes to Available.
    Figure 5 Creating a cluster

Step 3: Configuring a Data Migration Task

Configure an Elasticsearch cluster migration task for the Logstash cluster.

  1. On the Logstash cluster management page, select the created Sample-Logstash cluster. The Cluster Information page is displayed.
  2. Click Configuration Center on the right.
    Figure 6 Logstash Configuration Center
  3. On the Configuration Center page, click Create in the upper right corner. On the Create Configuration File page that is displayed, edit the configuration file.
    Figure 7 Create Configuration File
    Table 5 Parameters for creating a configuration file

    Parameter

    Description

    Example Value

    Name

    User-defined configuration file name.

    It can contain only letters, digits, hyphens (-), and underscores (_), and must start with a letter. The minimum length is 4 characters.

    es-es

    Configuration File Content

    Expand System Templates, find elasticsearch, and click Apply in the Operation column. In the Configuration File Content area, configure the configuration file based on comments in the template.

    See Table 6 for key configuration items. Use the default settings for others.

    Hidden Content

    For items that you enter in this box, the corresponding strings will be replaced with *** in the configurations.

    Enter sensitive strings that you want to hide, and press Enter.

    You can enter a maximum of 20 strings, each with a maximum length of 512 bytes.

    N/A

    Table 6 Configuration item description

    Configuration Item

    Description

    Example Value

    hosts

    Enter the addresses of the source and destination Elasticsearch clusters in input and output, respectively. For details about how to obtain the cluster addresses, see Step 1: Obtaining Elasticsearch Cluster Information.

    input hosts: http://10.62.179.32:9200

    output hosts: http://10.62.179.33:9200

    user

    Username for accessing the Elasticsearch cluster. This parameter is required for security-mode clusters. For non-security mode clusters, use # to comment out this parameter.

    Use # to comment it out.

    password

    Password for accessing the Elasticsearch cluster. This parameter is required for security-mode clusters. For non-security mode clusters, use # to comment out this parameter.

    Use # to comment it out.

    index

    Specifies indexes that need to be migrated. You can use a wildcard.

    index*

  4. Click Next to configure Logstash pipeline parameters.
    Figure 8 Configuring pipeline parameters
    Table 7 Pipeline parameters

    Parameter

    Description

    Example Value

    pipeline.workers

    Number of worker threads that will execute the Filters and Outputs stages of the pipeline in parallel.

    4

    pipeline.batch.size

    Maximum number of events that a worker thread collects from inputs before attempting to execute its filters and outputs. A larger value is more effective but increases memory overhead.

    125

    pipeline.batch.delay

    Maximum wait time for each new event before scheduling small batches to the pipeline worker thread and creating a pipeline event batch.

    50

    query.type

    An internal queue model for event buffering.

    • memory indicates a traditional memory-based queue.
    • persisted indicates a disk-based ACKed persistent queue.

    memory

  5. Click Create. The system automatically verifies the configuration file. When the configuration file status changes to Available, the creation is successful.
    Figure 9 Configuration file verification

Step 4: Starting the Migration Task

Start the configured migration task in the Logstash cluster.

  1. On the Logstash cluster management page, select the created Sample-Logstash cluster. The Cluster Information page is displayed.
  2. Click Configuration Center on the right.
  3. Select a configuration file whose status is Available, and click Start. In the pipeline list, the Events column shows number of tasks processed by each stage of the pipeline.
    Figure 10 Starting a task
  4. After the data migration is complete, check the data consistency between the source and destination Elasticsearch clusters. For example, run the _cat/indices command in the source and destination clusters, separately, to check whether their indexes are consistent.
    1. On the Elasticsearch cluster management page, select the source Elasticsearch cluster Source-ES or the destination Elasticsearch cluster Dest-ES, and click Access Kibana in the Operation column to access the Kibana console.
    2. In the Kibana navigation pane on the left, choose Dev Tools.
    3. On the Console page, run the following command to view index information:
      _cat/indices

Step 5: Stopping the Task

After data migration is complete, stop the migration task.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Logstash.
  3. In the cluster list, select the Sample-Logstash cluster, and click Configuration Center in the Operation column.
    Figure 11 Configuration Center
  4. Select the name of the pipeline that has been started, and click Stop All to stop all running tasks. Wait until all pipeline tasks are stopped.
    Figure 12 Stopping the task

Step 6: Deleting the Cluster

After data migration is completed, you may delete clusters created earlier to reclaim resources. Before you start, make sure that all pipeline tasks are stopped.

Before deleting a cluster, stop all running tasks, and back up the necessary files.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Logstash.
  3. In the cluster list, locate the Sample-Logstash cluster, and choose More > Delete in the Operation column.
  4. In the confirmation dialog box, type in DELETE, and click OK.