Help Center/ Cloud Container Engine/ User Guide/ O&M/ O&M Best Practices/ Configuring Metrics and Alarms for Log Collection Status
Updated on 2025-09-05 GMT+08:00

Configuring Metrics and Alarms for Log Collection Status

Log collection is usually configured for enterprise applications. However, it may be interrupted, and data may be lost. To ensure log integrity and availability, you are advised to configure metrics and alarms for log collection status (such as the number of error logs and the number of written logs). If there is an exception, an alarm notification is sent by email or SMS, so that you can quickly respond to and handle the problem.

Configuring Metrics for Log Collection Status

The Cloud Native Log Collection add-on uses the fluent-bit component to collect logs. You can configure metrics for the fluent-bit component to monitor the log collection status and detect problems in a timely manner.

  1. Enable the metric collection of fluent-bit.

    fluent-bit uses non-basic metrics. After metric collection is enabled, you will be billed for monitoring these metrics. For details, see AOM Pricing Details.

    1. Log in to the CCE console and click the cluster name to access the cluster console.
    2. In the navigation pane, choose Settings. On the Monitoring tab, enable Preset Policies.

    3. Click Manage under Preset Policies in Collection Settings.

    4. Enable preset policies for fluent-bit.

    5. Edit the trustlist to add the fluentbit_input_ingestion_paused metric. (If the Cloud Native Cluster Monitoring add-on version is 3.12.1 or later, skip this step.)

  2. Configure a dashboard on AOM or Grafana.

    You are advised to create a dashboard on AOM and add the node-level parameters listed in the following table for PromQL statements to collect statistics on log collection of each node for alarms.
    Table 1 Node-level parameters

    Parameter

    PromQL Statement

    Setting Note

    Bytes Written per Second

    sum(irate(fluentbit_input_bytes_total[2m])) by (pod)

    Small specification: < 5 MByte/s; large specification: < 10 MByte/s

    Logs Written per Second

    sum(irate(fluentbit_input_records_total[2m])) by (pod)

    Small specification: < 10,000/s; large specification: < 20,000/s

    Input Storage Limit Exceeded

    sum(fluentbit_input_storage_overlimit) by (pod)

    The value cannot be greater than 0 for a long time. Occasional non-zero values have no impacts.

    Input Paused

    sum(fluentbit_input_ingestion_paused) by (pod)

    The value cannot be greater than 0 for a long time. Occasional non-zero values have no impacts.

Configuring Alarms for Log Collection Status

You can configure alarms for log collection status on AOM so that you can respond to and handle problems in a timely manner.