Using Logstash to Ingest Data from OBS into Elasticsearch
You can use Logstash of CSS to ingest data stored in OBS to Elasticsearch for efficient data migration, search, and analytics.
Scenario
OBS is used to store massive amounts of data. When you need to perform quick search and analysis on such data, you can use CSS Logstash to ingest the data from OBS into an Elasticsearch cluster. Common scenarios include:
- Data search and analytics: Synchronizes data (such as logs and service data) stored in OBS periodically or in real time to Elasticsearch for quick full-text search, aggregation, and visualization.
- OBS bucket operation analysis: Synchronizes OBS bucket access logs to Elasticsearch for auditing, behavior analysis, and anomaly detection.
Solution Architecture

Logstash can ingest data from a variety of data sources. For data stored in OBS, you can use Logstash's logstash-input-s3 plugin to read objects from OBS and then use the logstash-output-elasticsearch plugin to synchronize the processed data to the destination Elasticsearch cluster.
- Input: The logstash-input-s3 plugin connects to an OBS bucket, monitors specified files, and read their data.
- Processing: Filters can be configured in the Logstash configuration file to clean, transform, and structure the data (for example, use the Grok filter to parse log formats).
- Output: The logstash-output-elasticsearch plugin connects to the destination Elasticsearch cluster and indexes the data.
Highlights
- High compatibility: Supports multiple data formats, such as JSON, CSV, and OBS logs.
- High scalability: Filters can be configured to clean data, extract specific fields, and convert data formats.
- High flexibility: Supports different destinations, such as CSS Elasticsearch, In-house built Elasticsearch, and third-party Elasticsearch services.
Prerequisites
- The target data has been uploaded to the OBS bucket, which is in the same region as the destination Elasticsearch cluster. The following information about the OBS bucket has been obtained: bucket name, endpoint, and region. For details, see Accessing OBS.
- The destination CSS Elasticsearch cluster has been created, and the following information about it has been obtained: access address (host), and username and password in the case of a security-mode cluster (not required for non-security mode clusters).
- A CSS Logstash cluster has been created. It is in the same VPC as the destination Elasticsearch cluster and the two are connected.
Procedure
- Access the CSS Logstash cluster.
- Log in to the CSS management consoleCSS management console.
- In the navigation pane on the left, choose Clusters > Logstash.
- Test network connectivity. If Logstash and Elasticsearch are deployed in the same VPC, skip this step.
- In the Logstash cluster list, find the target cluster and click Configuration Center in the Operation column.
- On the Configuration Center page, click Test Connectivity.
- In the Test Connectivity dialog box, enter the IP address and port number of the destination Elasticsearch, and click Test.
If Available is displayed, the network is connected.
- If Logstash and Elasticsearch are on the same internal network but are not connected, connect them by referring to Configuring Routes for a Logstash Cluster.
- If Elasticsearch is on an external network and the network cannot be reached, connect them by referring to Configuring Public Network Access for a Logstash Cluster.
- Prepare the Logstash configuration file.
- In the Logstash cluster list, find the target cluster and click Configuration Center in the Operation column.
- On the Configuration Center page, click Create in the upper-right corner, edit the configuration file, and save the change.
An example of the configuration file content (modify it based on service requirements):
input { s3 { access_key_id => "YOUR_AK" # Access Key ID of the account secret_access_key => "YOUR_SK" # Secret Access Key of the account bucket => "log_obs_access" # OBS bucket name prefix => "access_log" # File prefix (optional) endpoint => "https://OBS_Endpoint" # OBS access address region => "REGION" # Region where OBS is located watch_for_new_files => true # Monitor new files temporary_directory => "/opt/data/tmp/" # Temporary storage directory backup_to_bucket => "backup_log_obs_access" # Backup bucket (optional) backup_add_prefix => "backup-" # Backup file prefix (optional) } } filter { # (Optional) Example: Parsing OBS bucket logs grok { match => { "message" => '%{OBS_ACCESS_LOG_PATTERN}' } } mutate { remove_field => ["@version", "message"] # Remove redundant fields } } output { elasticsearch { user => "USERNAME" # Elasticsearch username (only for a security-mode cluster) password => "YOUR_PASSWORD" # Elasticsearch password (only for a security-mode cluster) hosts => ["192.168.0.xxx:9200"] # Elasticsearch cluster address index => "delivery_events_log_alias" # Index name manage_template => false # Do not manage index templates ilm_enabled => false # Disable the ILM policy. } }
Replace OBS_ACCESS_LOG_PATTERN with the actual Grok pattern used for log parsing. For details, see FAQ: How Do I Check OBS Bucket Access Logs?.
- Start a Logstash pipeline task.
- On the Configuration Center page, select the newly created configuration file, and click Start in the upper-left corner.
- In the Start Logstash dialog box, select Keepalive to ensure that the Logstash configuration file keeps running upon service restart or any incidents, so that data processing will not be interrupted.
When Keepalive is enabled, a daemon process is configured on each node. If the Logstash service becomes faulty, the daemon process will try to rectify the fault and restart the service, ensuring that the Logstash pipelines run efficiently and reliably.
- Click OK to start the configuration file.
In the pipeline list, you can check the configuration file status and monitor data migration.
- Verify data synchronization.
- Log in to the CSS management consoleCSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the displayed cluster list, find the target cluster, and click Access Kibana in the Operation column to log in to the Kibana console.
- In the navigation pane on the left, choose Dev Tools.
- Run the following index query command. Check whether the expected number of records is returned for the target index.
GET delivery_events_log_alias/_count { "query": {"match_all": {}} }
If the value of count in the result is not 0, data has been synchronized.
FAQ: How Do I Check OBS Bucket Access Logs?
If logging is enabled for an OBS bucket, Logstash can read the OBS bucket's access logs and write the log data to Elasticsearch, where you can then analyze operations performed on the OBS bucket.
A sample of the OBS bucket log:
15e02840b2784ffb9dcac293afc01a75 genean-hot-test [02/Feb/2024:07:51:17 +0000] 58.250.177.72 15e02840b2784ffb9dcac293afc01a75 0000018D68CCEF1AD3296F98BB7B4255 REST.GET.OBJECT 00002c9d-172a-495d-b0a3-4aca8c1f86cc "GET /genean-hot-test/00002c9d-172a-495d-b0a3-4aca8c1f86cc?AWSAccessKeyId=H0N1CQY4D1ZBSHABB3NF&Expires=1706860576&response-content-disposition=attachment&response-content-type=application/octet-stream&x-amz-security-token=*****&Signature=DeC2lkZVFB4CjDjKe147ZOl8sKY%3D HTTP/1.1" 200 - 82509 82509 84 84 "https://console.example.com/" "Mozilla/5.0 (xx.xx.xx; xx; xx) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0" - - STANDARD - "-" efe6d963779e4475bb81c5cb1f5f5368
For more information, see Using Logging to Record OBS Logs.
You can add filters to the Logstash configuration file to customize OBS logs. See the following for an example:
filter { grok { match => { "message" => '(?<BucketOwner>[^ ]+) (?<Bucket>[^ ]+) \[%{HTTPDATE:Time}\] (?<RemoteIP>[^ ]+) (?<Requester>[^ ]+) (?<RequestID>[^ ]+) (?<Operation>[^ ]+) (?<Key>[^ ]+) \"(?<RequestURI>[^"]+)\" (?<HTTPStatus>[^ ]+) (?<ErrorCode>[^ ]+) (?<BytesSent>[^ ]+) (?<ObjectSize>[^ ]+) (?<TotalTime>[^ ]+) (?<Turn-AroundTime>[^ ]+) \"(?<Referer>[^"]+)\" \"(?<User-Agent>[^"]+)\" (?<VersionID>[^ ]+) (?<STSLogUrn>[^ ]+) (?<StorageClass>[^ ]+) (?<TargetStorageClass>[^ ]+) \"(?<DentryName>[^ ]+)\" (?<IAMUserID>[^ ]+)' } timeout_millis => 3000 timeout_scope => "event" } mutate { remove_field => ["@version","message"] } }
Once OBS bucket logs are ingested into Elasticsearch, you can search and analyze the data in Elasticsearch.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot