Updated on 2024-12-28 GMT+08:00

Data Collection Overview

You can enable access to third-party (non-Huawei Cloud) logs in SecMaster. SecMaster uses Logstash to collect logs from many types of sources. Logs are comprehensively collected for historical data analysis, associated data analysis, and unknown threat detection.
Figure 1 Data Collection

Data Collection Principles

The basic principle of data collection is as follows: SecMaster uses a component controller (isap-agent) that is installed on your ECSs to manage the collection component Logstash, and Logstash transfer security data in your organization or between you and SecMaster.

Figure 2 Functional architecture of the collection system

Description

  • Collector: custom Logstash. A collector node is a custom combination of Logstash+ component controller (isap-agent).
  • Node: If you install SecMaster component controller isap-agent on an ECS and use IAM to authorize SecMaster to manage the ECS, the ECS is called a node. You need to deliver data collection engine Logstash to managed nodes on the Components page.
  • Component: A component is a custom Logstash that works as a data aggregation engine to receive and send security log data.
  • Connector: A connector is a basic element for Logstash. It defines the way Logstash receives source data and the standards it follows during the process. Each connector has a source end and a destination end. Source ends and destination ends are used for data inputs and outputs, respectively. The SecMaster pipeline is used for log data transmission between SecMaster and your devices.
  • Parser: A parser is a basic element for configuring custom Logstash. Parsers mainly work as filters in Logstash. SecMaster preconfigures varied types of filters and provides them as parsers. In just a few clicks on the SecMaster console, you can use parsers to generate native scripts to set complex filters for Logstash. In doing this, you can convert raw logs into the format you need.
  • Collection channel: A collection channel is equivalent to a Logstash pipeline. Multiple pipelines can be configured in Logstash. Each pipeline consists of the input, filter, and output parts. Pipelines work independently and do not affect each other. You can deploy a pipeline for multiple nodes. A pipeline is considered one collection channel no matter how many nodes it is configured for.

Limitations and Constraints

  • Currently, the data collection component controller can run on ECSs running the Linux x86_64 or Arm64 architecture.
  • Only IAM users can be used to install component controller and check details on the console. The IAM user can have only the minimum permissions assigned. For details, see Preparations.

Collector Specifications

The following table describes the specifications of the ECSs that are selected as nodes in collection management.

Table 1 Collector Specifications

vCPUs

Memory

System Disk

Data Disk

Referenced Processing Capability

4 vCPUs

8 GiB

50 GiB

100 GiB

2,000 EPS @ 1 KB

4,000 EPS @ 500 B

8 vCPUs

16 GiB

50 GiB

100 GiB

5,000 EPS @ 1 KB

10,000 EPS @ 500 B

16 vCPUs

32 GiB

50 GiB

100 GiB

10,000 EPS @ 1 KB

20,000 EPS @ 500 B

32 vCPUs

64 GiB

50 GiB

100 GiB

20,000 EPS @ 1 KB

40,000 EPS @ 500 B

64 vCPUs

128 GiB

50 GiB

100 GiB

40,000 EPS @ 1 KB

80,000 EPS @ 500 B

NOTE:

The ECS must have at least two vCPUs and 4 GB of memory. A disk of at least 100 GB must be attached as the directory disk.

The log volume usually increases in proportion to the server specifications. Generally, you are advised to increase the log volume based on the specifications in the table. If there is huge pressure on a collector, you can deploy multiple collectors and manage them in a unified manner through collection channels. This can distribute the log forwarding pressure across collectors.

Before installing the component controller, you are advised to mount a disk and use the disk partitioning script to allocate the disk. To ensure the installation and running of Logstash, the directory partition must have more than 100 GB of free space.

Log Source Limit

You can add as many as log sources you need to the collectors as long as your cloud resources can accommodate those logs. You can scale cloud resources anytime to meet your needs.

Data Collection Process

Figure 3 Data collection process
Table 2 Description of the data collection process

No.

Step

Description

1

Managing Nodes

Select or purchase an ECS and install the component controller on the ECS to complete node management.

2

Installing Components

Install data collection engine Logstash on the Components tab to complete component installation.

3

Configuring Connectors

Configure the source and destination connectors. Select a connector as required and set parameters.

4

(Optional) Configuring a Parser

Configure codeless parsers on the console based on your needs.

5

Configuring a Collection Channel

Configure the connection channels, associate it with a node, and deliver the Logstash pipeline configuration to complete the data collection configuration.

6

Verifying the Collection Result

After the collection channel is configured, check whether data is collected.

If logs are sent to the SecMaster pipeline, you can query the result on the SecMaster Security Analysis page.

Data Collection Configuration Removal Process

Figure 4 Data collection configuration removal process
Table 3 Description of the data collection configuration removal process

No.

Step

Description

1

Deleting a collection channel

On the Collection Channels page, stop and delete the Logstash pipeline configuration.

Note: All collection channels on related nodes must be stopped and deleted first.

2

(Optional) Deleting a parser

If a parser is configured, delete it on the Parsers tab.

3

(Optional) Deleting a data connection

If a data connection is added, delete the source and destination connectors on the Connections tab.

4

Removing a component

Delete the collection engine Logstash installed on the node and remove the component.

5

Deregistering a node

Remove the component controller to complete node deregistration.

Note: Deregistering a node does not delete the ECS and endpoint resources. If the data collection function is no longer used, you need to manually release the resources.