Starting Migration Tasks
In large-scale data processing and real-time log ingestion scenarios, developers often need to pull data from various data sources, such as Kafka and MySQL, to Elasticsearch or OpenSearch. Key challenges in such efforts include network connectivity, task interruptions, and the complexity of multi-pipeline management. CSS provides a web-based configuration center for Logstash clusters, where you can test network connectivity between Logstash and both the data source and destination in one click, as well as start or hot-start pipelines. To ensure high availability and stability in complex scenarios, CSS also deploys a Keepalive mechanism supported by a daemon process.
- Pipeline: A core Logstash component for data processing, comprising three stages: input, filter, and output.
- Keepalive: A high availability mechanism. When enabled, a daemon process monitors Logstash; and if Logstash fails, the daemon process automatically attempts to restore it.
Comparing Pipeline Start Options
| Option | Start | Hot Start |
|---|---|---|
| How It Works | When the cluster has no running pipelines, the system initializes the runtime environment and loads the configuration. | When the cluster already has running pipelines, a new pipeline is dynamically loaded without restarting the entire Logstash process, ensuring zero service interruptions. |
| Batch Processing | Up to 50 configuration files can be started at once. | Only one configuration file can be started at a time. |
| Keepalive | You can choose to enable or disable Keepalive. | The Keepalive setting is inherited from existing pipelines in the cluster. It cannot be modified. |
| When to Use | The cluster has no Running pipelines. | The cluster already has Running pipelines. You use this option to add new pipelines without interrupting existing pipelines. |
| Constraint | N/A |
|
Constraints
There can be up to 50 Running configuration files in the pipeline list of a Logstash cluster.
Preparations: Testing Connectivity
Before starting a migration task, test the connectivity between Logstash and both the data source and destination. This helps prevent task failures caused by connectivity issues.
How a connectivity test works:
- IP layer test: Run the fping command to check whether the target IP address is reachable.
- Transport layer test: Run the telnet command to check whether the TCP port associated with the target address is reachable.
Prerequisites for the connectivity test
Depending on the locations of the Logstash cluster, data source, and destination, the following conditions must be met:
- If they are in the same VPC: The security group used by the Logstash cluster allows outbound traffic for all protocols, and inbound traffic for the ICMP protocol as well as the TCP ports associated with the peer addresses. For details, see Adding a Security Group Rule.
- If they are in different VPCs: The security group used by the Logstash cluster allows outbound traffic for all protocols, and inbound traffic for the ICMP protocol as well as the TCP ports associated with the peer addresses. In addition, you must configure a route between the Logstash cluster and the peer VPCs. For details, see Configuring Routes for a Logstash Cluster.
- If they are in different LANs (they must communicate across the public network): Configure a public IP address for the Logstash cluster to ensure it can be accessed from the Internet. For details, see Configuring Public Network Access.
- Go to the Configuration Center page.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Logstash.
- In the cluster list, click the name of the target cluster. The cluster information page is displayed.
- Click the Configuration Center tab.
- On the Configuration Center page, click Test Connectivity.
- In the Test Connectivity dialog box, enter the IP addresses and port numbers of the data source and destination, and click Test.
You can test a maximum of 10 IP addresses at a time. You can click Add to add more IP addresses and click Test at the bottom to test connectivity to multiple IP addresses at a time.
Figure 1 Test Connectivity
If Available is returned, the network is connected. If Unavailable is returned, troubleshoot by following What Should I Do If the Result of a Logstash Connectivity Test Is Unavailable?.
Starting a Migration Task
Start a Logstash pipeline to migrate data from the source to the destination.
- Check whether there are pipelines whose status is Running in the pipeline list.
- Start a configuration file.
- In the configuration file list, select 1 to 50 configuration files and click Start Logstash above.
- In the displayed dialog box, choose whether to select Keepalive.
Select Keepalive for long-term data streams in production environments. If the Logstash service stops due to an unexpected error, the system will automatically try to restart it.
Deselect Keepalive for one-time offline data migration tasks. Note that if the source has no data, enabling Keepalive may cause the pipeline to restart repeatedly, which may lead to errors.
- Click OK to activate Logstash pipelines.
- Hot-start a configuration file.
- In the configuration file list, select one configuration file, and click Hot Start above.
- In the displayed dialog box, confirm the Keepalive setting. The Keepalive setting is inherited from existing pipelines in the cluster. It cannot be modified.
- Click OK to activate the Logstash pipeline.
- Verify the result. When the status of the new pipeline changes to Running in the pipeline list, the data migration task has started and is running properly. The Events column will be dynamically updated to show the number of processed documents.
Monitoring Pipelines
After starting data migration pipelines, you can monitor them in the following ways:
- Check their metrics.
In the pipeline list, locate the target pipeline, and click Metric Monitoring in the Operation column to go to the Cloud Eye console, where you can check the metrics to evaluate its status and performance.
When the Events data of a pipeline changes dynamically, the monitoring data changes accordingly. When a pipeline is being started or stopped, or the Events data is stable, the monitoring data remains unchanged.
For details about the metrics supported, see Logstash Pipeline Monitoring Metrics. For how to configure alarms, see Using Cloud Eye to Monitor Clusters.
- Check run logs.
Above the pipeline list, click Run Logs to check the logs of the Logstash process, which you can use to troubleshoot data parsing errors.
- Check operation records.
Above the pipeline list, click View to check the start and stop records of the pipelines. This helps track historical operations and facilitate troubleshooting.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot