Cloud Native Logging
When logging is enabled (Enabling Logging), Cloud Native Logging is automatically installed for an on-premises cluster. You can also manually install this add-on by referring to this section. For details about this add-on, see Cloud Native Logging.
Overview
Cloud Native Logging is based on Fluent Bit and OpenTelemetry. It supports CRD-based log collection policies, as well as collects and forwards stdout logs, container file logs, node logs, and Kubernetes events of containers in a cluster. After Cloud Native Logging is installed, stdout logs and Kubernetes events are collected by default. For details about how to use Cloud Native Logging to collect logs, see Collecting Data Plane Logs.
Constraints
- This add-on is only available in clusters v1.21 or later.
- A maximum of 50 log collection rules can be configured for each cluster.
- This add-on cannot collect .gz, .tar, and .zip logs.
- If the node storage driver is Device Mapper, container file logs must be collected from the path where the data disk is attached to the node.
- If the container runtime is containerd, each stdout log cannot be in multiple lines.
- In each cluster, up to 10,000 single-line logs can be collected per second, and up to 2,000 multi-line logs can be collected per second.
- The container running time must be longer than 1 minute for log collection to prevent logs from being deleted too quickly.
Permissions
The fluent-bit component of Cloud Native Logging reads and collects the stdout logs on each node, container file logs, and node logs based on the collection configuration.
The following permissions are required for running the fluent-bit component:
- CAP_DAC_OVERRIDE: ignores the discretionary access control (DAC) restrictions on files.
- CAP_FOWNER: ignores the restrictions that the file owner ID must match the process user ID.
- DAC_READ_SEARCH: ignores the DAC restrictions on file reading and catalog research.
- SYS_PTRACE: allows all processes to be traced.
Assigning Authorization Before Installing Cloud Native Logging in an On-Premises Cluster
Cloud Native Logging needs to be authenticated before accessing LTS and AOM. This add-on leverages workload identities to allow workloads in an on-premises cluster to impersonate IAM users to access cloud services.
Workload identities allow you to add the public key of an on-premises cluster for an IAM IdP and add a rule to map a ServiceAccount to an IAM account. During workload deployment, the token of the ServiceAccount is mounted to the workload. This token is used to access cloud services. This way, the AK/SK of the IAM account is not required, reducing security risks.
- Obtain the JSON Web Key Set (JWKS) issued by the private key of the on-premises cluster. The JWKS is used to verify the ServiceAccount token issued by this cluster.
- Use kubectl to access the on-premises cluster.
- Run the following command to obtain the public key:
kubectl get --raw /openid/v1/jwks
A json string is returned, containing the signature public key of the cluster for accessing the IdP.
{ "keys": [ { "kty": "RSA", "e": "AQAB", "use": "sig", "kid": "Ew29q....", "alg": "RS256", "n": "peJdm...." } ] }
- Create an IdP for your on-premises cluster in IAM.
- Log in to the IAM console, query the ID of the project that the on-premises cluster belongs to, create an IdP, and select OpenID Connect for Protocol. Enter the IdP name for log-agent. For details, see Table 1. For details about how to configure permissions for a user group, see User Group Policy Content.
Table 1 log-agent IdP settings Add-on Name
IdP Name
Client ID
Namespace
ServiceAccount Name
Minimum Permissions on User Groups
log-agent
ucs-cluster-identity-{Project ID}
ucs-cluster-identity
monitoring
log-agent-serviceaccount
aom:alarm:*
lts:*:*
Figure 1 Modifying IdP information
- Click OK and modify the IdP information as described in Table 2. Click Create Rule to create an identity conversion rule.
Figure 2 Modifying IdP information
Table 2 IdP parameters Parameter
Description
Access Type
Select Programmatic access.
Configuration Information
Identity Conversion Rules
An identity conversion rule maps a ServiceAccount in an on-premises cluster to an IAM user group.
- Attribute: sub
- Condition: any_one_of
- Value:
Value format: system:serviceaccount:Namespace:ServiceAccountName.
Change Namespace to the namespace for which the ServiceAccount is to be created, and change ServiceAccountName to the name of the ServiceAccount to be created.
For example, if the value is system:serviceaccount:monitoring:log-agent-serviceaccount, a ServiceAccount named log-agent is created in the monitoring namespace and mapped to the corresponding user group. The IAM token obtained using this ServiceAccount has the permissions of the user group.
NOTE:ServiceAccountName and user group permissions are mandatory for running add-ons in an on-premises cluster. For details, see Table 1.
Figure 3 Creating an identity conversion rule
- Click OK.
- Log in to the IAM console, query the ID of the project that the on-premises cluster belongs to, create an IdP, and select OpenID Connect for Protocol. Enter the IdP name for log-agent. For details, see Table 1. For details about how to configure permissions for a user group, see User Group Policy Content.
Installing log-agent in an On-Premises Cluster
- Log in to the UCS console and choose Fleets. Then, click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. Locate Cloud Native Logging on the right and click Install.
- On the Install Add-on page, configure the specifications.
Table 3 Add-on specifications Parameter
Description
Add-on Specifications
The add-on specifications can be of the Low, High, or custom-resources type.
Pods
Number of pods that will be created to match the selected add-on specifications.
If you select custom-resources, you can adjust the number of pods as required.
Containers
The log-agent add-on contains the following containers, whose specifications can be adjusted as required:
- fluent-bit: indicates the log collector, which is installed on each node as a DaemonSet.
- cop-logs: generates and updates configuration files on the collection side.
- log-operator: parses and updates log collection rules.
- otel-collector: forwards logs collected by fluent-bit to LTS in a centralized manner.
- Configure the parameters in Parameters.
Interconnection with AOM: If this option is enabled, Kubernetes events will be collected and reported to AOM. You can configure alarm rules on AOM.
- Configure the network for reporting add-on instance logs.
- Public network: This option features flexibility, cost-effectiveness, and easy access. It is only available for clusters that can access the public network.
- Direct Connect or VPN: After you connect an on-premises data center to a VPC over Direct Connect or VPN, you can use a VPC endpoint to access CIA over the private network. This option features high speed, low latency, and high security. For details, see Using Direct Connect or VPN to Report Logs of On-Premises Clusters.
- Click Install.
log-agent Components
Component |
Description |
Resource Type |
---|---|---|
fluent-bit |
Lightweight log collector and forwarder deployed on each node to collect logs |
DaemonSet |
cop-logs |
Used to generate soft links for collected files and run in the same pod as fluent-bit |
DaemonSet |
log-operator |
Used to generate internal configuration files |
Deployment |
otel-collector |
Used to collect logs from applications and services and report the logs to LTS |
Deployment |
Change History
Add-on Version |
Supported Cluster Version |
New Feature |
---|---|---|
1.4.1 |
v1.21 v1.22 v1.23 v1.24 v1.25 v1.26 v1.27 v1.28 v1.29 |
This is the first official release. It can be installed in the on-premises clusters. |
Reporting Custom Events to AOM
The log-agent add-on reports all warning events and some normal events to AOM. You can also set the events to be reported as required.
- Run the following command on the cluster to modify the event collection settings:
- Modify the event collection settings as required.
apiVersion: logging.openvessel.io/v1 kind: LogConfig metadata: annotations: helm.sh/resource-policy: keep name: default-event-aom namespace: kube-system spec: inputDetail: # Settings on UCS from which events are collected type: event # Type of logs to be collected. Do not change the value. event: normalEvents: # Used to configure normal events enable: true # Whether to enable normal event collection includeNames: # Names of events to be collected. If this parameter is not specified, all events will be collected. - NotTriggerScaleUp excludeNames: # Names of events that are not collected. If this parameter is not specified, all events will be collected. - NotTriggerScaleUp warningEvents: # Used to configure warning events enable: true # Whether to enable warning event collection includeNames: # Names of events to be collected. If this parameter is not specified, all events will be collected. - NotTriggerScaleUp excludeNames: # Names of events that are not collected. If this parameter is not specified, all events will be collected. - NotTriggerScaleUp outputDetail: type: AOM # Type of the system that receives the events. Do not change the value. AOM: events: - name: DeleteNodeWithNoServer # Event name. This parameter is mandatory. resourceType: Namespace # Type of the resource that operations are performed on. severity: Major # Event severity after an event is reported to AOM, which can be Critical, Major, Minor, or Info. The default value is Major.
log-agent Events
During log-agent installation and running, the log-operator component reports events. You can determine whether log-agent is installed and determine fault causes based on these events. For details, see Table 6.
Event Name |
Description |
---|---|
InitLTSFailed |
Failed to initialize the log streams in the LTS log group. |
WatchAKSKFailed |
Failed to listen to the AK/SK. |
WatchAKSKSuccessful |
AK/SK listened. |
RequestLTSFailed |
Failed to request the LTS interface. |
InitLTSSuccessful |
Log streams in the LTS log group initialized. |
CreateWebhookConfigFailed |
Failed to create MutatingWebhookConfiguration. |
CreateWebhookConfigSuccessful |
MutatingWebhookConfiguration created. |
StartServerSuccessful |
Listening enabled. |
StartServerFailed |
Failed to enable listening. |
StartManagerFailed |
Failed to enable CRD listening. |
InjectAnnotationFailed |
Failed to inject annotations. |
InjectAnnotationSuccessful |
Annotations injected. |
UpdateLogConfigFailed |
Failed to update the logconfig information. |
GetConfigListFailed |
Failed to obtain the CR list. |
GenerateConfigFailed |
Failed to generate the fluent-bit and otel settings. |
log-agent Metrics
The log-operator, fluent-bit, and otel-collector components of the log-agent add-on have a series of metrics. You can use AOM or Prometheus to monitor these metrics to check the running of the log-agent add-on in a timely manner. For details, see Monitoring Custom Metrics Using AOM or Monitoring Custom Metrics Using Prometheus. The following lists the metrics:
- log-operator (only for Huawei Cloud clusters)
Address: /metrics
Protocol: HTTPS
Table 7 Metrics Metric
Description
Type
log_operator_aksk_latest_update_times
Last update time of the AK/SK
Gauge
log_operator_aksk_update_total
Cumulative count of AK/SK update times
Counter
log_operator_send_request_total
Cumulative count of requests that have been sent
Counter
log_operator_webhook_listen_status
Webhook listening status
Gauge
log_operator_http_request_duration_seconds
HTTP request latency
Histogram
log_operator_http_request_total
Cumulative count of HTTP requests
Counter
log_operator_webhook_request_total
Cumulative count of webhook requests
Counter
- fluent-bit
Address: /api/v1/metrics/prometheus
Protocol: HTTP
Table 8 Metrics Metric
Description
Type
fluentbit_filter_add_records_total
Cumulative count of records added by the Fluent Bit filter add-on
Counter
fluentbit_filter_drop_records_total
Cumulative count of records dropped by the Fluent Bit filter add-on
Counter
fluentbit_input_bytes_total
Number of input bytes
Counter
fluentbit_input_files_closed_total
Cumulative count of files closed by the Fluent Bit input add-on
Counter
fluentbit_input_files_opened_total
Cumulative count of files opened by the Fluent Bit input add-on
Counter
fluentbit_input_files_rotated_total
Cumulative count of files rotated by the Fluent Bit input add-on
Counter
fluentbit_input_records_total
Number of input records
Counter
fluentbit_output_dropped_records_total
Number of dropped records
Counter
fluentbit_output_errors_total
Number of output errors
Counter
fluentbit_output_proc_bytes_total
Number of processed output bytes
Counter
fluentbit_output_proc_records_total
Number of processed output records
Counter
fluentbit_output_retried_records_total
Number of retried records
Counter
fluentbit_output_retries_total
Number of output retries
Counter
fluentbit_uptime
Number of seconds that Fluent Bit has been running
Counter
fluentbit_build_info
Build version information
Gauge
- otel-collector
Address: /metrics
Protocol: HTTP
Table 9 Metrics Metric
Description
Type
otelcol_exporter_enqueue_failed_log_records
Number of log records failed to be added to the sending queue
Counter
otelcol_exporter_enqueue_failed_metric_points
Number of metric points failed to be added to the sending queue
Counter
otelcol_exporter_enqueue_failed_spans
Number of spans failed to be added to the sending queue
Counter
otelcol_exporter_send_failed_log_records
Number of log records failed to be sent
Counter
otelcol_exporter_sent_log_records
Number of log records that have been sent
Counter
otelcol_process_cpu_seconds
Total CPU user and system time in seconds
Counter
otelcol_process_memory_rss
Total physical memory (resident set size)
Gauge
otelcol_process_runtime_heap_alloc_bytes
Bytes of allocated heap objects
Gauge
otelcol_process_runtime_total_alloc_bytes
Cumulative bytes allocated for heap objects
Counter
otelcol_process_runtime_total_sys_memory_bytes
Total bytes of memory obtained from the OS
Gauge
otelcol_process_uptime
Uptime of the process in seconds
Counter
otelcol_receiver_accepted_log_records
Number of log records received and processed by the OpenTelemetry receiver
Counter
otelcol_receiver_refused_log_records
Number of log records rejected by the OpenTelemetry receiver
Counter
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot