Cloud Native Log Collection
Description
The Cloud Native Log Collection plug-in (formerly log-agent) is developed based on Fluent Bit and OpenTelemetry for collecting logs and Kubernetes events. The plug-in can collect standard output logs of training and inference instances in a cluster to LTS.
Log Collection Reliability
The log system's main purpose is to record all stages of data for service components, including startup, initialization, exit, runtime details, and exceptions. It is primarily employed in O&M scenarios for tasks like checking component status and analyzing fault causes.
Standard streams (stdout and stderr) use non-persistent storage. However, data integrity may be compromised due to the following risks:
- Log rotation and compression potentially deleting old files
- Temporary storage volumes being cleared when Kubernetes pods end
- Automatic OS cleanup triggered by limited node storage space
While Cloud Native Log Collection employs techniques like multi-level buffering, priority queues, and resumable uploads to enhance log collection reliability, logs could still be lost in the following situations:
- The service log throughput surpasses the collector's processing capacity.
- The service pod is abruptly terminated and reclaimed by CCE.
- The log collector pod experiences exceptions.
Based on the best practices of cloud native log collection, the following suggestions are provided:
- Use dedicated, high-reliable streams to record critical service data (for example, financial transactions) and store the data in persistent storage.
- Do not store sensitive information such as customer details, payment credentials, and session tokens in logs.
Constraints
You are advised to install 1.7.3 or later.
Supported CCE versions: v1.21 to v1.32
Plug-in Performance Specifications
Performance Item |
Description |
Remarks |
---|---|---|
Size of a log |
A single log cannot be larger than 512 KB. If multi-line logs are collected, the length of each line will be calculated separately. |
N/A |
Maximum number of collected files |
On a single node, no more than 4,095 files can be listened by all log collection rules. |
N/A |
Configuration update |
Configuration updates take effect in 1 to 3 minutes. |
N/A |
Installing a Plug-in
Install the specified plug-in in the resource pool.
- Log in to the ModelArts console. In the navigation pane on the left, choose Standard Cluster.
- Click the resource pool to access its details page.
- On the resource pool details page, click the Plug-ins tab.
- Locate the plug-in to be installed in the list and click Install.
- In the displayed dialog box, configure the parameters.
Table 1 Parameters for configuring Cloud Native Log Collection Parameter
Sub-Parameter
Description
Specifications
Plug-in Version
Version of Cloud Native Log Collection to be deployed. Version 1.7.3 is supported.
Plug-in Specifications
Preset: Select Small or Large.
Small: A cluster that supports a maximum of 5,000 logs per second
Large: A cluster that supports a maximum of 10,000 logs per second
Custom: You can adjust the number of plug-in instances and resource quotas as required. High availability is not possible with a single instance. If an error occurs on the node where the plug-in instance runs, the plug-in will fail.
Configuration List
Detailed configurations of the specified specifications
Parameter Configuration
Log Group
Select a log group from the drop-down list. A log group is the basic unit for LTS to manage logs.
Log Stream
Select a log stream from the drop-down list.
A log stream is the basic unit for log reads and writes. If there are many logs to collect, you are advised to separate logs into different log streams based on log types, and name log streams in an easily identifiable way.
Collect Logical Subpool Logs
Logs for logical subpools are not collected by default. Once this function is enabled, you can collect logs for each logical subpool and set the collection policy.
Click Add Logical Pool, select a created logical pool and the corresponding log group and log stream.
- Read "Usage Notes" and select I have read and understand the preceding information.
- Click OK.
Components
Component |
Description |
Resource Type |
---|---|---|
fluent-bit |
Lightweight log collector and forwarder deployed on each node to collect logs. In 1.5.0 and later versions, logs are directly reported to LTS. |
DaemonSet |
cop-logs |
Used to generate soft links for collected files and run in the same pod as fluent-bit. |
DaemonSet |
log-operator |
Used to generate internal configuration files. |
Deployment |
otel-collector |
Used to collect Kubernetes events and report them to LTS and AOM, and receive and report logs to LTS. The log reporting scope depends on the plug-in version. In 1.5.1 and later versions, this component reports only workload logs that are scaled to CCI. |
Deployment |
Related Operations
For details, see Viewing the Plug-ins of a Standard Resource Pool on the Resource Pool Details Page.
Change History
Plug-in Version |
Supported CCE Cluster Versions |
New Feature |
---|---|---|
1.7.3 |
v1.21 v1.23 v1.25 v1.27 v1.28 v1.29 v1.30 v1.31 |
Collecting standard output logs of containers is supported. |
1.7.2 |
v1.21 v1.23 v1.25 v1.27 v1.28 v1.29 v1.30 v1.31 |
Logs can be compressed in gzip format and sent to LTS. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot