Creating a Configuration File
Enterprise data integration often involves complex requirements for cleaning and aggregating heterogeneous data from multiple sources, such as databases, logs, and message queues. Writing Logstash configuration files manually can be challenging due to complex syntax and debugging. It can also introduce security risks, for example, when database passwords or other credentials are stored in plaintext in configuration files. CSS Logstash addresses this challenges with a visual configuration center that enables you to build secure, efficient, and easy-to-manage data processing pipelines. The configuration center allows you to quickly generate ETL logic using a custom template and convenient developer-assisting features. A built-in sensitive information masking feature helps protect user credentials and other confidential data. You can also fine-tune pipeline parameters such as concurrency and buffering to meet diverse performance and throughput requirements.
How the Feature Works
A Logstash configuration file defines a pipeline consisting of three stages:
- Input: Data is ingested from one or more sources.
- Filter: Data is parsed, cleaned, transformed, and enriched. This is typically the most CPU-intensive stage.
- Output: Data is sent to a destination, such as Elasticsearch.
For more information about them, see the official document Logstash Plugins.
Constraints
- A maximum of 50 configuration files can be created for a Logstash cluster.
- A maximum of five configuration files can be in the verifying state at the same time.
- The size of a single configuration file cannot exceed 100 KB.
Prerequisites
- You have obtained the connection information for both the data source (such as Elasticsearch, MySQL, and Kafka) and destination (such as Elasticsearch), including IP addresses, port numbers, accounts, and passwords.
- The Logstash cluster's VPC and security group rules allow communication with both the data source and destination.
Creating a Configuration File
- Go to the Configuration Center page.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Logstash.
- In the cluster list, click the name of the target cluster. The cluster information page is displayed.
- Click the Configuration Center tab.
- On the Configuration Center page, click Create in the upper-right corner. The Create Configuration File page is displayed.
- Edit the configuration file to define the data collection and processing workflow.
Table 1 Parameters for creating a configuration file Parameter
Description
Name
User-defined configuration file name.
It can contain only letters, digits, hyphens (-), and underscores (_), and must start with a letter. The minimum length is 4 characters.
You are advised to include a description of the data's purpose in the configuration file name to facilitate management.
Configuration File Content
Define the input, filter, and output logic.
To use a configuration file template, expand System Templates or Custom Templates, select a template based on its description, and click Apply in the Operation column to copy the content of the template.
- System templates are preset configuration file templates of CSS. They cover various types of input sources, such as JDBC, Redis, DIS, and Beats. For details, see Configuration File Templates.
- Custom templates are templates added by users.
The size of each configuration file cannot exceed 100 KB.
Hidden Content
Specify strings to hide in the configuration file, such as passwords and access keys. Press Enter after each string. These strings will be displayed as *** in the configuration file.
You can configure a maximum of 20 strings to hide, each with a maximum length of 512 bytes.
Description
Add a description for the configuration file for easy identification.
The value can contain 0 to 128 characters.
- Click Next to configure pipeline parameters.
- Configure pipeline runtime parameters based on the data volume and reliability requirements.
Table 2 Pipeline configuration parameters Parameter
Category
Default Value
Description
pipeline.workers
Concurrency control
The number of CPU cores
Number of worker threads that execute the Filters and Outputs stages of the pipeline in parallel.
- For an I/O-intensive pipeline (such as simple forwarding), set this parameter to the number of CPU cores.
- For a CPU-intensive pipeline (such as complex regular expression parsing), set this parameter to the number of CPU cores or a slightly lower value.
Do not exceed the number of CPU cores on a single node. Otherwise, increased context switching overhead will degrade performance.
pipeline.batch.size
Throughput
125
Maximum number of events that a worker thread collects from inputs before attempting to execute its filters and outputs.
When processing a large number of small documents, increasing this value (for example, 500 to 3000) can significantly enhance throughput. However, due to limited JVM heap memory, setting this value too high can lead to out of memory (OOM) errors.
pipeline.batch.delay
Latency control
50
Maximum amount of time in milliseconds that a pipeline worker waits for each new event while its current batch is not yet full.
Unit: ms
Reduce the value of this parameter when real-time performance takes priority.
queue.type
Data reliability
memory
Configures an internal queue for event buffering.
- memory: Uses a traditional memory-based queue, which delivers high performance but risks data loss in case of process failures.
- persisted: Uses a disk-based persistent queue, which prevents data loss through persistent storage and supports resumable data transfer. When selecting this mode, monitor storage usage carefully to avoid exhaustion.
queue.checkpoint.writes
Checkpoint
1024 (recommended)
Maximum number of written events before a forced checkpoint.
This parameter needs to be set only when queue.type is set to persisted.
queue.max_bytes
Disk-based queue
1024 (recommended)
Total capacity of a persistent queue.
- This parameter needs to be set only when queue.type is set to persisted.
- Ensure sufficient disk space. Logstash will stop receiving new data when disk is used up.
Unit: MB
- After the configuration is complete, click Create.
- Return to the configuration file list. If the Status of the new configuration file changes to Available, it is created successfully.
Managing Configuration Files
You can edit existing configuration files, set a configuration file as a custom template, and delete configuration files.
| Operation | Constraints | Operation Guide |
|---|---|---|
| Modifying a configuration file | A configuration file that has an ongoing pipeline task cannot be modified. | In the configuration file list, find the row that contains the configuration file you want to edit, and click Edit in the Operation column. Modify the file to adapt to new requirements or correct errors. |
| Setting a configuration file as a template | N/A | In the configuration file list, click Add to Custom Template in the Operation column. In the displayed dialog box, set the template name, description, and configuration file content, and click OK. It can then be used as a template for configuring configuration files subsequently. |
| Backing up configuration files | N/A | Click |
| Deleting a configuration file |
| In the configuration file list, find the row that contains the configuration file you want to delete, and click Delete in the Operation column. In the displayed dialog box, type in DELETE and then click OK to confirm the deletion. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot
