Help Center/ Cloud Search Service/ Best Practices/ Using Elasticsearch, In-House Built Logstash, and Kibana to Build a Log Management Platform

Updated on 2024-09-13 GMT+08:00

View PDF

Using Elasticsearch, In-House Built Logstash, and Kibana to Build a Log Management Platform

A unified log management platform built using a CSS Elasticsearch cluster can manage logs in real time in a unified and convenient manner, enabling log-driven O&M and improving service management efficiency.

Scenarios

This document utilizes Elasticsearch, Filebeat, Logstash, and Kibana to illustrate the construction of a unified log management platform. Filebeat collects ECS logs and forwards them to Logstash for data processing. The processed data is stored in Elasticsearch and can be queried, analyzed, and visualized using Kibana. This solution is applicable in the following scenarios:

Log Management: Centrally manage application and system logs to quickly identify faults.
Security Monitoring: Detect and respond to security threats, detect intrusions, and analyze abnormal behaviors.
Service Analysis: Analyze user behaviors to optimize products and services.
Performance Monitoring: Monitor system and application performance in real-time to detect bottlenecks.

Overview

Elasticsearch, Logstash, Kibana, and Beats (ELKB) provides a complete set of log solutions and is a mainstream log system.

Elasticsearch is an open-source, distributed search and analytics engine used to store, search, and analyze large volumes of data.
Logstash is a server-side data pipeline that collects, parses, and enriches data before sending it to Elasticsearch.
Kibana provides an open-source data analysis and visualization platform for Elasticsearch, enabling users to search, view, and interact with the data stored in Elasticsearch.
Beats, such as Filebeat and Metricbeat, are lightweight data collectors installed on servers to collect and forward data to Logstash.

Figure 1 shows the architecture of the log management platform using Elasticsearch and Logstash.

Figure 1 ELKB architecture

Collection
- As a data collector, Beats gather data from various sources and send it to Logstash.
- Logstash can independently collect data or receive it from Beats to filter, transform, and enhance the data.
Data processing
Before sending data to Elasticsearch, Logstash performs necessary processing, such as parsing structured logs and filtering out irrelevant information.
Data storage
As a core storage component, Elasticsearch indexes and stores data from Logstash, providing quick search and data retrieval functions.
Data analysis and visualization
Kibana is used to analyze and visualize data in Elasticsearch, allowing users to create dashboards and reports to visualize the data.

For details about the version compatibility of ELKB components, see https://www.elastic.co/support/matrix#matrix_compatibility.

Advantages

Real-time: Provide real-time data collection and analysis capabilities.
Flexibility: Support various data sources and flexible data processing flows.
Ease of use: The user-friendly interface simplifies data operations and visualization.
Scalability: Offer strong horizontal expansion capabilities, enabling the processing of petabyte-level data.

Prerequisites

You have created an Elasticsearch cluster in non-security mode. For details, see Creating an Elasticsearch Cluster.
You have applied for an ECS and installed the Java environment. For details about how to purchase an ECS, see Purchasing and Logging In to a Linux ECS.

Procedure

Log in to the ECS, deploy and configure Filebeat.
1. Download Filebeat. The recommended version is 7.6.2. Download it at https://www.elastic.co/downloads/past-releases#filebeat-oss.
2. Configure the Filebeat configuration file filebeat.yml.
  For example, to collect all the files whose names end with log in the /root/ directory, configure the filebeat.yml file is as follows:
```
filebeat.inputs:
- type: log
  enabled: true
  # Path of the collected log file
  paths:
    - /root/*.log

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
# Logstash hosts information
output.logstash:
  hosts: ["192.168.0.126:5044"]

processors:
```
Deploy and configure Logstash in-house.

To achieve better performance, you are advised to set the JVM parameter to half of the ECS or docker memory for in-house built Logstash.
1. Download Logstash. The recommended version is 7.6.2. Download it at https://www.elastic.co/downloads/past-releases#logstash-oss.
2. Ensure that Logstash and the CSS cluster are connected. Run the curl http:// {ip}:{port} command on the VM to test the connectivity between the VM and the Elasticsearch cluster. If 200 is returned, they are connected.
3. Configure the Logstash configuration file logstash-sample.conf.
  The content of the logstash-sample.conf file is as follows:
```
input {
  beats {
    port => 5044
  }
}
# Split data.
filter {
    grok {
        match => {
            "message" => '\[%{GREEDYDATA:timemaybe}\] \[%{WORD:level}\] %{GREEDYDATA:content}'
        }
    }
    mutate {
      remove_field => ["@version","tags","source","input","prospector","beat"]
    }
}
# CSS cluster information
output {
  elasticsearch {
    hosts => ["http://192.168.0.4:9200"]
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    #user => "xxx"
    #password => "xxx"
  }
}
```
  You can use Grok Debugger (https://grokdebugger.com/) to configure the filter mode of Logstash.

Configure the index template of the Elasticsearch cluster.

Log in to the CSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the cluster list, locate the target cluster and click Kibana in the Operation column to log in to Kibana.
Click Dev Tools in the navigation tree on the left.

Create an index template.

For example, create an index template. Let the index use three shards and no replicas. Fields such as @timestamp, content, host.name, level, log.file.path, message and timemaybe are defined in the index.

PUT _template/filebeat
{
  "index_patterns": ["filebeat*"],
  "settings": {
    # Define the number of shards.
    "number_of_shards": 3,
    # Define the number of copies.
    "number_of_replicas": 0,
    "refresh_interval": "5s"
  },
  # Define a field.
  "mappings": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "content": {
            "type": "text"
          },
          "host": {
            "properties": {
              "name": {
                "type": "text"
              }
            }
          },
          "level": {
            "type": "keyword"
          },
          "log": {
            "properties": {
              "file": {
                "properties": {
                  "path": {
                    "type": "text"
                  }
                }
              }
            }
          },
          "message": {
            "type": "text"
          },
          "timemaybe": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss||strict_date_optional_time||epoch_millis||EEE MMM dd HH:mm:ss zzz yyyy"
          }
        }
    }
}

Prepare test data on ECS.

Run the following command to generate test data and write the data to /root/tmp.log:

bash -c 'while true; do echo [$(date)] [info] this is the test message; sleep 1; done;' >> /root/tmp.log &

The following is an example of the generated test data:

[Thu Feb 13 14:01:16 CST 2020] [info] this is the test message

Run the following command to start Logstash:

nohup ./bin/logstash -f /opt/pht/logstash-6.8.6/logstash-sample.conf &

Run the following command to start Filebeat:
```
./filebeat
```
Use Kibana to query data and create reports.
1. Enter the Kibana page of the Elasticsearch cluster.
2. Click Discover in the navigation tree on the left, as shown in Figure 2.
  Figure 2 Discover page