Creating a Configuration File

Enterprise data integration often involves complex requirements for cleaning and aggregating heterogeneous data from multiple sources, such as databases, logs, and message queues. Writing Logstash configuration files manually can be challenging due to complex syntax and debugging. It can also introduce security risks, for example, when database passwords or other credentials are stored in plaintext in configuration files. CSS Logstash addresses this challenges with a visual configuration center that enables you to build secure, efficient, and easy-to-manage data processing pipelines. The configuration center allows you to quickly generate ETL logic using a custom template and convenient developer-assisting features. A built-in sensitive information masking feature helps protect user credentials and other confidential data. You can also fine-tune pipeline parameters such as concurrency and buffering to meet diverse performance and throughput requirements.

How the Feature Works

A Logstash configuration file defines a pipeline consisting of three stages:

Input: Data is ingested from one or more sources.
Filter: Data is parsed, cleaned, transformed, and enriched. This is typically the most CPU-intensive stage.
Output: Data is sent to a destination, such as Elasticsearch.

For more information about them, see the official document Logstash Plugins.

Constraints

A maximum of 50 configuration files can be created for a Logstash cluster.
A maximum of five configuration files can be in the verifying state at the same time.
The size of a single configuration file cannot exceed 100 KB.

Prerequisites

You have obtained the connection information for both the data source (such as Elasticsearch, MySQL, and Kafka) and destination (such as Elasticsearch), including IP addresses, port numbers, accounts, and passwords.
The Logstash cluster's VPC and security group rules allow communication with both the data source and destination.

Creating a Configuration File

Go to the Configuration Center page.
1. Log in to the CSS management console.
2. In the navigation pane on the left, choose Clusters > Logstash.
3. In the cluster list, click the name of the target cluster. The cluster information page is displayed.
4. Click the Configuration Center tab.
On the Configuration Center page, click Create in the upper-right corner. The Create Configuration File page is displayed.

Edit the configuration file to define the data collection and processing workflow.

**Table 1** Parameters for creating a configuration file
Parameter	Description
Name	User-defined configuration file name. It can contain only letters, digits, hyphens (-), and underscores (_), and must start with a letter. The minimum length is 4 characters. You are advised to include a description of the data's purpose in the configuration file name to facilitate management.
Configuration File Content	Define the input, filter, and output logic. To use a configuration file template, expand System Templates or Custom Templates, select a template based on its description, and click Apply in the Operation column to copy the content of the template. System templates are preset configuration file templates of CSS. They cover various types of input sources, such as JDBC, Redis, DIS, and Beats. For details, see Configuration File Templates. Custom templates are templates added by users. The size of each configuration file cannot exceed 100 KB.
Hidden Content	Specify strings to hide in the configuration file, such as passwords and access keys. Press Enter after each string. These strings will be displayed as ******* in the configuration file. You can configure a maximum of 20 strings to hide, each with a maximum length of 512 bytes.
Description	Add a description for the configuration file for easy identification. The value can contain 0 to 128 characters.

Click Next to configure pipeline parameters.

Configure pipeline runtime parameters based on the data volume and reliability requirements.

**Table 2** Pipeline configuration parameters
Parameter	Category	Default Value	Description
pipeline.workers	Concurrency control	The number of CPU cores	Number of worker threads that execute the Filters and Outputs stages of the pipeline in parallel. For an I/O-intensive pipeline (such as simple forwarding), set this parameter to the number of CPU cores. For a CPU-intensive pipeline (such as complex regular expression parsing), set this parameter to the number of CPU cores or a slightly lower value. Do not exceed the number of CPU cores on a single node. Otherwise, increased context switching overhead will degrade performance.
pipeline.batch.size	Throughput	125	Maximum number of events that a worker thread collects from inputs before attempting to execute its filters and outputs. When processing a large number of small documents, increasing this value (for example, 500 to 3000) can significantly enhance throughput. However, due to limited JVM heap memory, setting this value too high can lead to out of memory (OOM) errors.
pipeline.batch.delay	Latency control	50	Maximum amount of time in milliseconds that a pipeline worker waits for each new event while its current batch is not yet full. Unit: ms Reduce the value of this parameter when real-time performance takes priority.
queue.type	Data reliability	memory	Configures an internal queue for event buffering. memory: Uses a traditional memory-based queue, which delivers high performance but risks data loss in case of process failures. persisted: Uses a disk-based persistent queue, which prevents data loss through persistent storage and supports resumable data transfer. When selecting this mode, monitor storage usage carefully to avoid exhaustion.
queue.checkpoint.writes	Checkpoint	1024 (recommended)	Maximum number of written events before a forced checkpoint. This parameter needs to be set only when queue.type is set to persisted.
queue.max_bytes	Disk-based queue	1024 (recommended)	Total capacity of a persistent queue. This parameter needs to be set only when queue.type is set to persisted. Ensure sufficient disk space. Logstash will stop receiving new data when disk is used up. Unit: MB

After the configuration is complete, click Create.
Return to the configuration file list. If the Status of the new configuration file changes to Available, it is created successfully.

Managing Configuration Files

You can edit existing configuration files, set a configuration file as a custom template, and delete configuration files.

**Table 3** Managing configuration files
Operation	Constraints	Operation Guide
Modifying a configuration file	A configuration file that has an ongoing pipeline task cannot be modified.	In the configuration file list, find the row that contains the configuration file you want to edit, and click Edit in the Operation column. Modify the file to adapt to new requirements or correct errors.
Setting a configuration file as a template	N/A	In the configuration file list, click Add to Custom Template in the Operation column. In the displayed dialog box, set the template name, description, and configuration file content, and click OK. It can then be used as a template for configuring configuration files subsequently.
Backing up configuration files	N/A	Click in the upper right corner above the configuration file list to export all configuration files for local backup.
Deleting a configuration file	A configuration file that has an ongoing pipeline task cannot be deleted. The deletion cannot be undone.	In the configuration file list, find the row that contains the configuration file you want to delete, and click Delete in the Operation column. In the displayed dialog box, type in DELETE and then click OK to confirm the deletion.

Parent topic: Configuring Migration Tasks

Previous topic: Configuring Migration Tasks

Next topic: Starting Migration Tasks

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot