Updated on 2024-06-17 GMT+08:00

Cloud Native Logging Add-on

When logging is enabled (Enabling Logging), log-agent is automatically installed for an on-premises cluster. You can also manually install this add-on by referring to this section. For details about this add-on, see Cloud Native Logging.

Overview

log-agent is an add-on based on Fluent Bit and OpenTelemetry for cloud native logging. It supports CRD-based log collection policies, collects and forwards standard output logs, container file logs, node logs, and Kubernetes events of containers in a cluster. After the add-on is installed, standard output logs and Kubernetes events are collected by default. For details about how to use log-agent to collect logs, see Collecting Data Plane Logs.

Constraints

The following are constraints on using log-agent:
  • log-agent is available only in clusters v1.21 or later.
  • A maximum of 50 log collection rules can be configured for each cluster.
  • log-agent cannot collect .gz, .tar, and .zip logs.
  • If the node storage driver is Device Mapper, the container file logs must be collected from the path where the data disk is attached to the node.
  • If the container runtime is containerd, each standard output log cannot be in multiple lines.
  • In each cluster, up to 10,000 single-line logs can be collected per second, and up to 2,000 multi-line logs can be collected per second.
  • The container running time must be longer than 1 minute for log collection to prevent logs from being deleted too quickly.

Permissions

The fluent-bit component of the log-agent add-on reads and collects the standard output logs on each node, file logs in pods, and node logs based on the collection configuration.

The following permissions are required for running the fluent-bit component:

  • CAP_DAC_OVERRIDE: ignores the discretionary access control (DAC) restrictions on files.
  • CAP_FOWNER: ignores the restrictions that the file owner ID must match the process user ID.
  • DAC_READ_SEARCH: ignores the DAC restrictions on file reading and catalog research.
  • SYS_PTRACE: allows all processes to be traced.

Assigning Authorization for log-agent in Your On-Premises Cluster

The log-agent add-on needs to be authenticated before accessing LTS and AOM. This add-on uses Workload Identity to allow workloads in your on-premises cluster to impersonate IAM service accounts to access cloud services.

Workload Identity allows you to configure the public key of your cluster for the IAM IdP and add a mapping rule to map a ServiceAccount to an IAM service account. During workload deployment, the token of the ServiceAccount is mounted to the workload. This token is used to access cloud services. This way, the AK/SK of the IAM service account is not required, reducing security risks.

  1. Obtain the JSON Web Key Set (JWKS) of the on-premises cluster, which is used to verify the ServiceAccount token issued by ClusterIssuer.

    1. Use kubectl to access the on-premises cluster.
    2. Run the following command to obtain the public key:

      kubectl get --raw /openid/v1/jwks

      A json string is returned, containing the signature public key of the cluster for accessing the IdP.

      {
          "keys": [
              {
                  "kty": "RSA",
                  "e": "AQAB",
                  "use": "sig",
                  "kid": "Ew29q....",
                  "alg": "RS256",
                  "n": "peJdm...."
              }
          ]
      }

  2. Configure an IdP entity for your on-premises cluster on IAM.

    1. Log in to the IAM console, query the ID of the project that the on-premises cluster belongs to, create an identity provider, and select OpenID Connect for Protocol. Enter the IdP name for log-agent. For details, see Table 1.
      Table 1 log-agent IdP settings

      Add-on Name

      IdP Name

      Client ID

      Namespace

      ServiceAccount Name

      Minimum Permissions on User Groups

      log-agent

      ucs-cluster-identity-{Project ID}

      ucs-cluster-identity

      monitoring

      log-agent-serviceaccount

      aom:alarm:*

      lts:*:*

      Figure 1 Modifying identity provider information
    2. Click OK and modify the IdP information as described in Table 2. Click Create Rule to create an identity conversion rule.
      Figure 2 Modifying identity provider information
      Table 2 IdP parameters

      Parameter

      Description

      Access Type

      Select Programmatic access.

      Configuration Information

      • Identity Provider URL: Enter https://kubernetes.default.svc.cluster.local.
      • Client ID: Enter the client ID of log-agent. For details, see Table 1.
      • Signing Key: Enter the JWKS of the on-premises cluster obtained in 1.

      Identity Conversion Rules

      • An identity conversion rule maps a ServiceAccount in an on-premises to an IAM user group.
      • For example, create a ServiceAccount in namespace default of the cluster and map it to user group demo. If you use the IAM token obtained by the ServiceAccount to access cloud services, you have the permissions of the demo user group.
      • In a mapping rule, the attribute must be sub. The value format is system:serviceaccount:Namespace:ServiceAccountName.
      • ServiceAccountName and user group permissions are required for the running of the log-agent in an on-premises cluster. For details, see Table 1.
      Figure 3 Creating an identity conversion rule
    3. Click OK.

Installing log-agent in an On-Premises Cluster

  1. Log in to the UCS console and choose Fleets. Then, click the cluster name to access the cluster console. In the navigation pane on the left, choose Add-ons. Locate Cloud Native Logging on the right and click Install.
  2. On the Install Add-on page, configure the specifications.

    Table 3 Add-on specifications

    Parameter

    Description

    Add-on Specifications

    The add-on specifications can be of the Low, High, or custom-resources type.

    Pods

    Number of pods that will be created to match the selected add-on specifications.

    If you select custom-resources, you can adjust the number of pods as required.

    Containers

    The log-agent add-on contains the following containers, whose specifications can be adjusted as required:

    • fluent-bit: indicates the log collector, which is installed on each node as a DaemonSet.
    • cop-logs: generates and updates configuration files on the collection side.
    • log-operator: parses and updates log collection rules.
    • otel-collector: forwards logs collected by fluent-bit to LTS in a centralized manner.

  3. Configure the parameters in Parameters.

    Interconnection with AOM: If this option is enabled, Kubernetes events will be collected and reported to AOM. You can configure alarm rules on AOM.

  4. Configure the network for reporting add-on instance logs.

    • Public network: This option features flexibility, cost-effectiveness, and easy access. It is only available for clusters that can access the public network.
    • Direct Connect or VPN: After you connect the on-premises network to the cloud network over Direct Connect or VPN, you can use a VPC endpoint to access CIA over the private network. This option features high speed, low latency, and security. For details, see Using Direct Connect or VPN to Report Logs of On-Premises Clusters.

  5. Click Install.

log-agent Components

Table 4 log-agent components

Component

Description

Resource Type

fluent-bit

Lightweight log collector and forwarder deployed on each node to collect logs

DaemonSet

cop-logs

Used to generate soft links for collected files and run in the same pod as fluent-bit

DaemonSet

log-operator

Used to generate internal configuration files

Deployment

otel-collector

Used to collect logs from applications and services and report the logs to LTS

Deployment

Change History

Table 5 Release history

Add-on Version

Supported Cluster Version

New Feature

1.4.1

v1.21

v1.22

v1.23

v1.24

v1.25

v1.26

v1.27

v1.28

This is the first official release. It can be installed in the on-premises clusters.

Reporting Custom Events to AOM

The log-agent add-on reports all warning events and some normal events to AOM. You can also set the events to be reported as required.

  1. Run the following command on the cluster to modify the event collection settings:

    kubectl edit logconfig -n kube-system default-event-aom

  2. Modify the event collection settings as required.
    apiVersion: logging.openvessel.io/v1
    kind: LogConfig
    metadata:
      annotations:
        helm.sh/resource-policy: keep
      name: default-event-aom
      namespace: kube-system
    spec:
      inputDetail:    # Settings on UCS from which events are collected
        type: event    # Type of logs to be collected. Do not change the value.
        event:
          normalEvents:    # Used to configure normal events
            enable: true    # Whether to enable normal event collection
            includeNames:    # Names of events to be collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
            excludeNames:    # Names of events that are not collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
          warningEvents:    # Used to configure warning events
            enable: true    # Whether to enable warning event collection
            includeNames:    # Names of events to be collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
            excludeNames:    # Names of events that are not collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
      outputDetail:
        type: AOM    # Type of the system that receives the events. Do not change the value.
        AOM:
          events:
          - name: DeleteNodeWithNoServer    # Event name. This parameter is mandatory.
            resourceType: Namespace    # Type of the resource that operations are performed on.
            severity: Major    # Event severity after an event is reported to AOM, which can be Critical, Major, Minor, or Info. The default value is Major.

log-agent Events

During log-agent installation and running, the log-operator component reports events. You can determine whether log-agent is installed and determine fault causes based on these events. For details, see Table 6.

Table 6 log-agent events

Event Name

Description

InitLTSFailed

Failed to initialize the log streams in the LTS log group.

WatchAKSKFailed

Failed to listen to the AK/SK.

WatchAKSKSuccessful

AK/SK listened.

RequestLTSFailed

Failed to request the LTS interface.

InitLTSSuccessful

Log streams in the LTS log group initialized.

CreateWebhookConfigFailed

Failed to create MutatingWebhookConfiguration.

CreateWebhookConfigSuccessful

MutatingWebhookConfiguration created.

StartServerSuccessful

Listening enabled.

StartServerFailed

Failed to enable listening.

StartManagerFailed

Failed to enable CRD listening.

InjectAnnotationFailed

Failed to inject annotations.

InjectAnnotationSuccessful

Annotations injected.

UpdateLogConfigFailed

Failed to update the logconfig information.

GetConfigListFailed

Failed to obtain the CR list.

GenerateConfigFailed

Failed to generate the fluent-bit and otel settings.

log-agent Metrics

The log-operator, fluent-bit, and otel-collector components of the log-agent add-on have a series of metrics. You can use AOM or Prometheus to monitor these metrics to check the running of the log-agent add-on in a timely manner. For details, see Monitoring Custom Metrics Using AOM or Monitoring Custom Metrics Using Prometheus. The following lists the metrics:

  • log-operator (only for Huawei Cloud clusters)

    Port: 8443

    Address: /metrics

    Protocol: HTTPS

    Table 7 Metrics

    Metric

    Description

    Type

    log_operator_aksk_latest_update_times

    Last update time of the AK/SK

    Gauge

    log_operator_aksk_update_total

    Cumulative count of AK/SK update times

    Counter

    log_operator_send_request_total

    Cumulative count of requests that have been sent

    Counter

    log_operator_webhook_listen_status

    Webhook listening status

    Gauge

    log_operator_http_request_duration_seconds

    HTTP request latency

    Histogram

    log_operator_http_request_total

    Cumulative count of HTTP requests

    Counter

    log_operator_webhook_request_total

    Cumulative count of webhook requests

    Counter

  • fluent-bit

    Port: 2020

    Address: /api/v1/metrics/prometheus

    Protocol: HTTP

    Table 8 Metrics

    Metric

    Description

    Type

    fluentbit_filter_add_records_total

    Cumulative count of records added by the Fluent Bit filter add-on

    Counter

    fluentbit_filter_drop_records_total

    Cumulative count of records dropped by the Fluent Bit filter add-on

    Counter

    fluentbit_input_bytes_total

    Number of input bytes

    Counter

    fluentbit_input_files_closed_total

    Cumulative count of files closed by the Fluent Bit input add-on

    Counter

    fluentbit_input_files_opened_total

    Cumulative count of files opened by the Fluent Bit input add-on

    Counter

    fluentbit_input_files_rotated_total

    Cumulative count of files rotated by the Fluent Bit input add-on

    Counter

    fluentbit_input_records_total

    Number of input records

    Counter

    fluentbit_output_dropped_records_total

    Number of dropped records

    Counter

    fluentbit_output_errors_total

    Number of output errors

    Counter

    fluentbit_output_proc_bytes_total

    Number of processed output bytes

    Counter

    fluentbit_output_proc_records_total

    Number of processed output records

    Counter

    fluentbit_output_retried_records_total

    Number of retried records

    Counter

    fluentbit_output_retries_total

    Number of output retries

    Counter

    fluentbit_uptime

    Number of seconds that Fluent Bit has been running

    Counter

    fluentbit_build_info

    Build version information

    Gauge

  • otel-collector

    Port: 8888

    Address: /metrics

    Protocol: HTTP

    Table 9 Metrics

    Metric

    Description

    Type

    otelcol_exporter_enqueue_failed_log_records

    Number of log records failed to be added to the sending queue

    Counter

    otelcol_exporter_enqueue_failed_metric_points

    Number of metric points failed to be added to the sending queue

    Counter

    otelcol_exporter_enqueue_failed_spans

    Number of spans failed to be added to the sending queue

    Counter

    otelcol_exporter_send_failed_log_records

    Number of log records failed to be sent

    Counter

    otelcol_exporter_sent_log_records

    Number of log records that have been sent

    Counter

    otelcol_process_cpu_seconds

    Total CPU user and system time in seconds

    Counter

    otelcol_process_memory_rss

    Total physical memory (resident set size)

    Gauge

    otelcol_process_runtime_heap_alloc_bytes

    Bytes of allocated heap objects

    Gauge

    otelcol_process_runtime_total_alloc_bytes

    Cumulative bytes allocated for heap objects

    Counter

    otelcol_process_runtime_total_sys_memory_bytes

    Total bytes of memory obtained from the OS

    Gauge

    otelcol_process_uptime

    Uptime of the process in seconds

    Counter

    otelcol_receiver_accepted_log_records

    Number of log records received and processed by the OpenTelemetry receiver

    Counter

    otelcol_receiver_refused_log_records

    Number of log records rejected by the OpenTelemetry receiver

    Counter