Collecting Multi-Line Container Logs

Logs are collected by line by default. If a log is printed on multiple lines (for example, a Java program log), log data on each line may be incomplete.

You can enable multi-line log collection and match logs using regular expressions to collect complete logs. This section describes how to configure a multi-line log collection policy and policy configuration notes for different log types and container runtimes.

You can configure multi-line log collection using either of the modes listed in the following table.

Multi-Line Log Collection Mode	Description	Pro	Con
Collecting Logs in Multiline Text Mode	The first-line regular expression is used to match the first line of a log. Then, subsequent lines are treated as part of that log.	The configuration is simple.	You need to pay attention to different items in different log collection scenarios. In some scenarios, there may be performance bottlenecks and many constraints.
Collecting Logs Using Multiline Parsing Mode	A log is parsed and then all the lines are concatenated.	The process is simpler, performance is better, and constraints are fewer when a stdout log is processed.	The configuration is complex.

Prerequisites

The Cloud Native Log Collection add-on has been installed in the cluster, and Logging has been enabled. For details, see Collecting Container Logs Using the Cloud Native Log Collection Add-on.

Collecting Logs in Multiline Text Mode

In multiline text mode, the Cloud Native Log Collection add-on uses the first-line regular expression to match the first line of a log and then treat subsequent lines as part of that log. The add-on stores the log content in the content field and does not extract fields from the log. The time of each log is the system time of the node where the log is collected.

The following are notes on configuring multi-line log collection policies in different scenarios.

Policy Configuration Notes for Container File Logs or Node File Logs

Container file logs or node file logs are collected from their raw log files. If a regular expression can match the first line of a container file log or node file log, the log can be collected in most cases.

Note 1: When a first-line regular expression matches a log starting from the middle line, the content before that line will be lost. For example, if you only enter \d+-\d+-\d+ \d+:\d+:\d+.*, time= will be lost in the matched log content.

Note 2: If a regular expression is too simple and there is content that can be matched in non-first lines, the log may be truncated. For example, the regular expression contains only \d+-\d+-\d+ \d+:\d+:\d+.* for matching the logging time, and the log contains the time in the same format on the fourth line.

time=2025-04-01 16:33:06.254 level=info msg=Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting! at com.myproject.module.MyProject.badMethod(MyProject.java:22) at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18) at com.myproject.module.MyProject.anotherMethod(MyProject.java:14) at com.myproject.module.MyProject.someMethod(MyProject.java:10) 2025-04-01 15:24:30.199 at com.myproject.module.MyProject.someMethod(MyProject.java:10)(MyProject.java:10).java:10) at com.myproject.module.MyProject.main(MyProject.java:6) func=main.writeLog file=D:/cia-tools/cmd/benchmark/log-tool/log.go:96 inputNumber=1

The reported log is truncated, and the log content before the fifth line is lost.

Click to enlarge

Policy Configuration Notes for Stdout Logs on Nodes with Docker Installed

If the container runtime of a node is Docker, stdout logs are saved on the node in JSON format and need to be specially processed. As a result, some regular expressions seem to be correct but do not take effect.

In the example log, the collected raw file log is as follows:

{"log":"time=2025-03-30 23:02:57.355 level=info msg=Exception in thread \"main\" java.lang.RuntimeException: Something has gone wrong, aborting!\n","stream":"stdout","time":"2025-03-30T15:02:57.355429354Z"}
{"log":"at com.myproject.module.MyProject.badMethod(MyProject.java:22)\n","stream":"stdout","time":"2025-03-30T15:02:57.356272973Z"}
{"log":"at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)\n","stream":"stdout","time":"2025-03-30T15:02:57.35628203Z"}
{"log":"at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)\n","stream":"stdout","time":"2025-03-30T15:02:57.356286679Z"}
{"log":"at com.myproject.module.MyProject.someMethod(MyProject.java:10)\n","stream":"stdout","time":"2025-03-30T15:02:57.356290997Z"}
{"log":"at com.myproject.module.MyProject.someMethod(MyProject.java:10)(MyProject.java:10).java:10)\n","stream":"stdout","time":"2025-03-30T15:02:57.356294964Z"}
{"log":"at com.myproject.module.MyProject.main(MyProject.java:6) func=main.writeLog file=D:/cia-tools/cmd/benchmark/log-tool/log.go:96 inputNumber=44486\n","stream":"stdout","time":"2025-03-30T15:02:57.356298511Z"}

If you configure the regular expression time=\d+-\d+-\d+ \d+:\d+:\d+.*, the regular expression that takes effect in the configuration is (^{"log":"time=\d+-\d+-\d+ \d+:\d+:\d+.*"). Ensure that actual regular expressions can match the raw logs in JSON format.

Note 1: A regular expression must match the entire log. If the regular expression does not contain time= and matches the log from the time, the regular expression does not take effect. If the first letter of a log cannot be represented by a regular character, you can use .* to match the log, for example, .*\d+-\d+-\d+ \d+:\d+:\d+.*.
Note 2: Do not use the ^ and $ characters. If they are used, the first line of a log cannot be matched.

Policy Configuration Notes for Stdout Logs on Nodes with Containerd Installed

If the container runtime of a node is containerd, stdout logs contain the log content of users and additional log content. Special processing is also required. You need to check whether some regular expressions take effect. Processing such stdout logs is relatively complex, so the performance of multiline text mode for collecting stdout logs is lower than that for collecting other types logs. For large specifications, the recommended peak throughput of a single node is 10,000 logs per second or 5 MB per second. If the current performance cannot meet the log collection requirements, you can refer to Collecting Logs Using Multiline Parsing Mode.

In the example log, the collected raw file log is as follows:

2025-03-28T17:22:44.052300591+08:00 stdout F time=2025-03-28 17:22:44.052 level=info msg=Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
2025-03-28T17:22:44.052327792+08:00 stdout F at com.myproject.module.MyProject.badMethod(MyProject.java:22)
2025-03-28T17:22:44.052330808+08:00 stdout F at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
2025-03-28T17:22:44.052332771+08:00 stdout F at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
2025-03-28T17:22:44.052334906+08:00 stdout F at com.myproject.module.MyProject.someMethod(MyProject.java:10)
2025-03-28T17:22:44.05233719+08:00 stdout F at com.myproject.module.MyProject.someMethod(MyProject.java:10)(MyProject.java:10).java:10)
2025-03-28T17:22:44.052339194+08:00 stdout F at com.myproject.module.MyProject.main(MyProject.java:6) func=main.writeLog file=D:/cia-tools/cmd/benchmark/log-tool/log.go:96 inputNumber=3

Note 1: The regular expression cannot match the time of containerd. For example, if the regular expression is \d+-\d+\d+T\d+:\d+:\d+.*, each log can be matched by the regular expression, and multi-line collection is invalid.
Note 2: Do not use the ^ character. If it is used, the first line of a log cannot be matched.
Note 3: When a first-line regular expression matches a log starting from the middle line, the content before that line will be lost. For example, if you only enter \d+-\d+-\d+ \d+:\d+:\d+.*, time= will be lost in the matched log content.

To collect a log in multi-line text mode, take the following steps:

Log in to the CCE console, click the cluster name to access the cluster console, and choose Logging in the navigation pane.
In the upper right corner, click View Log Policy. Then, click Create Log Collection Policy.
Select Custom Policy. Configure Policy Name, Log Type, Log Source, and other parameters as needed.

In Log Format, select Multi-line and enter a regular expression that can match the first line rule.

For example, if the first line of the following log is always in the format of time={time} {log-content}, the regular expression can be set to time=\d+-\d+-\d+ \d+:\d+:\d+.*.

time=2025-04-01 15:24:30.199 level=info msg=Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
at com.myproject.module.MyProject.badMethod(MyProject.java:22)
at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)
at com.myproject.module.MyProject.someMethod(MyProject.java:10)(MyProject.java:10).java:10)
at com.myproject.module.MyProject.main(MyProject.java:6) func=main.writeLog file=D:/cia-tools/cmd/benchmark/log-tool/log.go:96 inputNumber=56505

time= is fixed, and \d+-\d+-\d+ \d+:\d+:\d+ matches the time, for example, 2025-04-01 15:24:30. .* matches any character following the time.

Select the log group and log stream for reporting logs to LTS and click OK.

You can view the following log on LTS.

Collecting Logs Using Multiline Parsing Mode

The Cloud Native Log Collection add-on version must be 1.7.3 or later.
Multiline parsing does not take effect for the pods of workloads that are scheduled to CCI.

In multiline parsing mode, the configuration of multi-line log collection is complex. However, in this mode, a log is parsed and then all the lines are concatenated. So, when a stdout log is processed, the process is simpler, performance is better, and constraints are fewer.

Basic Principles

The Cloud Native Log Collection add-on is built based on Fluent Bit. Fluent Bit introduced new multi-line log collection capability starting with its open-source version 1.8.0. The following is an example configuration:

[MULTILINE_PARSER]
        name          multiline-regex-test
        type          regex
        flush_timeout 1000
        #
        # Regex rules for multiline parsing
        # ---------------------------------
        #
        # configuration hints:
        #
        #  - first state always has the name: start_state
        #  - every field in the rule must be inside double quotes
        #
        # rules |   state name  | regex pattern                  | next state
        # ------|---------------|--------------------------------------------
        rule      "start_state"   "/time=\d+-\d+-\d+ \d+:\d+:\d+.*/"  "cont"
        rule      "cont"          "/^(?!time=\d+-\d+-\d+ \d+:\d+:\d+.*).*$/" "cont"

In multi-line text mode, only the regular expression of the first line of a log is required. In multi-line parsing mode, the regular expression of each line in a log is required and listed. The regular expressions are connected using state_name and next_state to concatenate multiple lines of a log.

For stdout logs, multi-line parsing provides parsers. A parser can parse a stdout log and then concatenate multiple lines of the log. In this way, the regular expressions of all lines are not affected by the format of the standard output. The following is the final configuration generated by the standard output. The containerd is used as an example.

[MULTILINE_PARSER]
        name          multiline-regex-test
        type          regex
        parser        cri
        key_content   log
        flush_timeout 5000
        #
        # Regex rules for multiline parsing
        # ---------------------------------
        #
        # configuration hints:
        #
        #  - first state always has the name: start_state
        #  - every field in the rule must be inside double quotes
        #
        # rules |   state name  | regex pattern                  | next state
        # ------|---------------|--------------------------------------------
        rule      "start_state"   "/time=\d+-\d+-\d+ \d+:\d+:\d+.*/"  "cont"
        rule      "cont"          "/^(?!time=\d+-\d+-\d+ \d+:\d+:\d+.*).*$/" "cont"
    [PARSER]
        Name        cri
        Format      regex
        Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

To collect a log in multiline parsing mode, take the following steps:

Use kubectl to access the cluster. For details, see Accessing a Cluster Using kubectl.

Create a YAML file named log-config.yaml. You can change the file name.

vi log-config.yaml

The following is an example YAML file for collecting a stdout log of a specified workload:

apiVersion: logging.openvessel.io/v1
kind: LogConfig
metadata:
  name: test-log-02  # Change the rule name as needed.
  namespace: kube-system  # Namespace of the collection rule. The value is fixed at kube-system.
spec:
  inputDetail  : # Input configuration
    type: container_file   # Input type. container_file indicates container file logs.
    containerFile:    # Container file log configuration. This parameter is valid only when type is set to container_file.
      workloads:        # Modify the workload information as needed.
      - namespace: monitoring  # Namespace that the workload belongs to
        kind: Deployment  # Workload type. The value can be Deployment, DaemonSet, StatefulSet, Job, or CronJob.
        name: prometheus-lightweight  # Workload name
        container: prometheus  # Container name
        files:
        - logPath: "/var/log"  # Log directory, which is an absolute path.
          filePattern: "*.log"  # Log file name, which supports wildcard characters.
    processors:    # Multi-line log definition
      type: multiline_parser  # Multiline type, which is fixed at multiline_parser.
      multilineParsers: 
      - type: regex               # The value is fixed at regex.
        flushTimeout: 5000   # Timeout interval for refreshing the multi-line buffer, in milliseconds. The default value is 5000.
        rules:
        - stateName: start_state # Name of the first multi-line rule, which must be start_state.
          regex: /time=\d+-\d+-\d+ \d+:\d+:\d+.*/  # Regular expression of the first line, which starts and ends with a slash (/).
          nextState: cont                # Name of the regular expression of the continuation line. You can change the name as needed. stateName must be contained in the regular expression.
        - stateName: cont
          regex: /^(?!time=\d+-\d+-\d+ \d+:\d+:\d+.*).*$/  # Regular expression of a non-first line
          nextState: cont
  outputDetail:  # Output configuration
    type: LTS    # Output type. The value is fixed at LTS.
    LTS:
      ltsGroupID: abf5f0ad-627e-41cc-8d3f-61c9e1f57f5a      # ID of the log group for reporting logs to LTS. The specified ID must be valid.
      ltsStreamID: f7ed71e9-6b9d-4ba3-86e4-b1b9d22ef4fb     # ID of the log stream for reporting logs to LTS. The specified ID must be valid.

If a non-first line log content is complex and you do not know the regular expression, you can use /^(?!{first-line-regular-expression}).*$/. The regular expression can match most non-first line log content. If the regular expression does not match the non-first line log content, adjust it as needed.

Create a LogConfig.
```
kubectl create -f log-config.yaml
```
If information similar to the following is displayed, the LogConfig has been created:
```
logconfig.logging.openvessel.io/test-log-xx created
```
Check the created LogConfig.
```
kubectl get LogConfig -n kube-system
```
If information similar to the following is displayed, the log collection policy has been created:
```
NAME                AGE
test-log-xx         30s
```