Configuring Metrics and Alarms for Log Collection Status
Log collection is usually configured for enterprise applications. However, it may be interrupted, and data may be lost. To ensure log integrity and availability, you are advised to configure metrics and alarms for log collection status (such as the number of error logs and the number of written logs). If there is an exception, an alarm notification is sent by email or SMS, so that you can quickly respond to and handle the problem.
Configuring Metrics for Log Collection Status
The Cloud Native Log Collection add-on uses the fluent-bit component to collect logs. You can configure metrics for the fluent-bit component to monitor the log collection status and detect problems in a timely manner.
- Enable the metric collection of fluent-bit.
fluent-bit uses non-basic metrics. After metric collection is enabled, you will be billed for monitoring these metrics. For details, see AOM Pricing Details.
- Log in to the CCE console and click the cluster name to access the cluster console.
- In the navigation pane, choose Settings. On the Monitoring tab, enable Preset Policies.
- Click Manage under Preset Policies in Collection Settings.
- Enable preset policies for fluent-bit.
- Edit the trustlist to add the fluentbit_input_ingestion_paused metric. (If the Cloud Native Cluster Monitoring add-on version is 3.12.1 or later, skip this step.)
- Configure a dashboard on AOM or Grafana.
You are advised to create a dashboard on AOM and add the node-level parameters listed in the following table for PromQL statements to collect statistics on log collection of each node for alarms.
Table 1 Node-level parameters Parameter
PromQL Statement
Setting Note
Bytes Written per Second
sum(irate(fluentbit_input_bytes_total[2m])) by (pod)
Small specification: < 5 MByte/s; large specification: < 10 MByte/s
Logs Written per Second
sum(irate(fluentbit_input_records_total[2m])) by (pod)
Small specification: < 10,000/s; large specification: < 20,000/s
Input Storage Limit Exceeded
sum(fluentbit_input_storage_overlimit) by (pod)
The value cannot be greater than 0 for a long time. Occasional non-zero values have no impacts.
Input Paused
sum(fluentbit_input_ingestion_paused) by (pod)
The value cannot be greater than 0 for a long time. Occasional non-zero values have no impacts.
Configuring Alarms for Log Collection Status
You can configure alarms for log collection status on AOM so that you can respond to and handle problems in a timely manner.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.