Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Cloud Native Log Collection

Updated on 2025-02-14 GMT+08:00

When logging is enabled (Enabling Logging), the Cloud Native Log Collection add-on is automatically installed for an on-premises cluster. You can also manually install this add-on by referring to this section. For details about this add-on, see Cloud Native Log Collection.

Introduction

The Cloud Native Log Collection add-on (log-agent) is based on Fluent Bit and OpenTelemetry. It supports CRD-based log collection policies, as well as collects and forwards stdout logs, container file logs, node logs, and Kubernetes events of containers in a cluster. After the Cloud Native Log Collection add-on is installed, stdout logs and Kubernetes events are collected by default. For details about how to use the Cloud Native Log Collection add-on to collect logs, see Collecting Data Plane Logs.

Constraints

The following are constraints on using the Cloud Native Log Collection add-on:
  • This add-on is only available in clusters v1.21 or later.
  • A maximum of 50 log collection rules can be configured for each cluster.
  • This add-on cannot collect .gz, .tar, and .zip logs.
  • If the node storage driver is Device Mapper, container file logs must be collected from the path where the data disk is attached to the node.
  • If the container runtime is containerd, each stdout log cannot be in multiple lines.
  • In each cluster, up to 10,000 single-line logs can be collected per second, and up to 2,000 multi-line logs can be collected per second.
  • The container running time must be longer than 1 minute for log collection to prevent logs from being deleted too quickly.

Permissions

The fluent-bit component of the Cloud Native Log Collection add-on reads and collects the stdout logs on each node, container file logs, and node logs based on the collection configuration.

The following permissions are required for running the fluent-bit component:

  • CAP_DAC_OVERRIDE: ignores the discretionary access control (DAC) restrictions on files.
  • CAP_FOWNER: ignores the restrictions that the file owner ID must match the process user ID.
  • DAC_READ_SEARCH: ignores the DAC restrictions on file reading and catalog research.
  • SYS_PTRACE: allows all processes to be traced.

Assigning Authorization Before Installing the Cloud Native Log Collection Add-on in an On-Premises Cluster

The Cloud Native Log Collection add-on needs to be authenticated before accessing LTS and AOM. This add-on leverages workload identities to allow workloads in an on-premises cluster to impersonate IAM users to access cloud services.

Workload identities allow you to add the public key of an on-premises cluster for an IAM IdP and add a rule to map a ServiceAccount to an IAM account. During workload deployment, the token of the ServiceAccount is mounted to the workload. This token is used to access cloud services. This way, the AK/SK of the IAM account is not required, reducing security risks.

  1. Obtain the JSON Web Key Set (JWKS) issued by the private key of the on-premises cluster. The JWKS is used to verify the ServiceAccount token issued by this cluster.

    1. Use kubectl to access the on-premises cluster.
    2. Run the following command to obtain the public key:

      kubectl get --raw /openid/v1/jwks

      A json string is returned, containing the signature public key of the cluster for accessing the IdP.

      {
          "keys": [
              {
                  "kty": "RSA",
                  "e": "AQAB",
                  "use": "sig",
                  "kid": "Ew29q....",
                  "alg": "RS256",
                  "n": "peJdm...."
              }
          ]
      }

  2. Create an IdP for your on-premises cluster in IAM.

    1. Log in to the IAM console, query the ID of the project that the on-premises cluster belongs to, create an IdP, and select OpenID Connect for Protocol. Enter the IdP name for log-agent. For details, see Table 1. For details about how to configure permissions for a user group, see User Group Policy Content.
      Table 1 log-agent IdP settings

      Add-on Name

      IdP Name

      Client ID

      Namespace

      ServiceAccount Name

      Minimum Permissions on User Groups

      log-agent

      ucs-cluster-identity-{Project ID}

      ucs-cluster-identity

      monitoring

      log-agent-serviceaccount

      aom:alarm:*

      lts:*:*

      Figure 1 Modifying IdP information
    2. Click OK and modify the IdP information as described in Table 2. Click Create Rule to create an identity conversion rule.
      Figure 2 Modifying IdP information
      Table 2 IdP parameters

      Parameter

      Description

      Access Type

      Select Programmatic access.

      Configuration Information

      • Identity Provider URL: Enter https://kubernetes.default.svc.cluster.local.
      • Client ID: Enter the client ID of log-agent. For details, see Table 1.
      • Signing Key: Enter the JWKS of the on-premises cluster obtained in 1. If multiple clusters are involved, use commas (,) to separate their keys.

      Identity Conversion Rules

      An identity conversion rule maps a ServiceAccount in an on-premises cluster to an IAM user group.

      • Attribute: sub
      • Condition: any_one_of
      • Value:

        Value format: system:serviceaccount:Namespace:ServiceAccountName.

        Change Namespace to the namespace for which the ServiceAccount is to be created, and change ServiceAccountName to the name of the ServiceAccount to be created.

        For example, if the value is system:serviceaccount:monitoring:log-agent-serviceaccount, a ServiceAccount named log-agent is created in the monitoring namespace and mapped to the corresponding user group. The IAM token obtained using this ServiceAccount has the permissions of the user group.

        NOTE:

        ServiceAccountName and user group permissions are mandatory for running add-ons in an on-premises cluster. For details, see Table 1.

      Figure 3 Creating an identity conversion rule
    3. Click OK.

Installing log-agent in an On-Premises Cluster

  1. Log in to the UCS console and choose Fleets. Then, click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. Locate Cloud Native Log Collection on the right and click Install.
  2. In the Install Add-on window, configure the specifications.

    Table 3 Add-on specifications

    Parameter

    Description

    Add-on Specifications

    The add-on specifications can be of the Low, High, or custom-resources type.

    Pods

    Number of pods that will be created to match the selected add-on specifications.

    If you select custom-resources, you can adjust the number of pods as required.

    Containers

    The log-agent add-on contains the following containers, whose specifications can be adjusted as required:

    • fluent-bit: indicates the log collector, which is installed on each node as a DaemonSet.
    • cop-logs: generates and updates configuration files on the collection side.
    • log-operator: parses and updates log collection rules.
    • otel-collector: forwards logs collected by fluent-bit to LTS in a centralized manner.

  3. Configure the parameters in Parameters.

    Interconnection with AOM: If this option is enabled, Kubernetes events will be collected and reported to AOM. You can configure alarm rules on AOM.

  4. Configure the network for reporting add-on instance logs.

    • Public network: This option features flexibility, cost-effectiveness, and easy access. It is only available for clusters that can access the public network.
    • Direct Connect or VPN: After you connect an on-premises data center to a VPC over Direct Connect or VPN, you can use a VPC endpoint to access CIA over the private network. This option features high speed, low latency, and high security. For details, see Using Direct Connect or VPN to Report Logs of On-Premises Clusters.

  5. Click Install.

log-agent Components

Table 4 log-agent components

Component

Description

Resource Type

fluent-bit

Lightweight log collector and forwarder deployed on each node to collect logs

DaemonSet

cop-logs

Used to generate soft links for collected files and run in the same pod as fluent-bit

DaemonSet

log-operator

Used to generate internal configuration files

Deployment

otel-collector

Used to collect logs from applications and services and report the logs to LTS

Deployment

Change History

Table 5 Release history

Add-on Version

Supported Cluster Version

New Feature

1.4.1

v1.21

v1.22

v1.23

v1.24

v1.25

v1.26

v1.27

v1.28

v1.29

v1.30

v1.31

This is the first official release. It can be installed in the on-premises clusters.

Reporting Custom Events to AOM

The log-agent add-on reports all warning events and some normal events to AOM. You can also set the events to be reported as required.

  1. Run the following command on the cluster to modify the event collection settings:

    kubectl edit logconfig -n kube-system default-event-aom

  2. Modify the event collection settings as required.
    apiVersion: logging.openvessel.io/v1
    kind: LogConfig
    metadata:
      annotations:
        helm.sh/resource-policy: keep
      name: default-event-aom
      namespace: kube-system
    spec:
      inputDetail:    # Settings on UCS from which events are collected
        type: event    # Type of logs to be collected. Do not change the value.
        event:
          normalEvents:    # Used to configure normal events
            enable: true    # Whether to enable normal event collection
            includeNames:    # Names of events to be collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
            excludeNames:    # Names of events that are not collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
          warningEvents:    # Used to configure warning events
            enable: true    # Whether to enable warning event collection
            includeNames:    # Names of events to be collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
            excludeNames:    # Names of events that are not collected. If this parameter is not specified, all events will be collected.
            - NotTriggerScaleUp
      outputDetail:
        type: AOM    # Type of the system that receives the events. Do not change the value.
        AOM:
          events:
          - name: DeleteNodeWithNoServer    # Event name. This parameter is mandatory.
            resourceType: Namespace    # Type of the resource that operations are performed on.
            severity: Major    # Event severity after an event is reported to AOM, which can be Critical, Major, Minor, or Info. The default value is Major.

log-agent Events

During log-agent installation and running, the log-operator component reports events. You can determine whether log-agent is installed and determine fault causes based on these events. For details, see Table 6.

Table 6 log-agent events

Event Name

Description

InitLTSFailed

Failed to initialize the log streams in the LTS log group.

WatchAKSKFailed

Failed to listen to the AK/SK.

WatchAKSKSuccessful

AK/SK listened.

RequestLTSFailed

Failed to request the LTS interface.

InitLTSSuccessful

Log streams in the LTS log group initialized.

CreateWebhookConfigFailed

Failed to create MutatingWebhookConfiguration.

CreateWebhookConfigSuccessful

MutatingWebhookConfiguration created.

StartServerSuccessful

Listening enabled.

StartServerFailed

Failed to enable listening.

StartManagerFailed

Failed to enable CRD listening.

InjectAnnotationFailed

Failed to inject annotations.

InjectAnnotationSuccessful

Annotations injected.

UpdateLogConfigFailed

Failed to update the logconfig information.

GetConfigListFailed

Failed to obtain the CR list.

GenerateConfigFailed

Failed to generate the fluent-bit and otel settings.

log-agent Metrics

The log-operator, fluent-bit, and otel-collector components of the log-agent add-on have a series of metrics. You can use AOM or Prometheus to monitor these metrics to check the running of the log-agent add-on in a timely manner. For details, see Monitoring Custom Metrics Using AOM or Monitoring Custom Metrics Using Prometheus. The following lists the metrics:

  • log-operator (only for Huawei Cloud clusters)

    Port: 8443

    Address: /metrics

    Protocol: HTTPS

    Table 7 Metrics

    Metric

    Description

    Type

    log_operator_aksk_latest_update_times

    Last update time of the AK/SK

    Gauge

    log_operator_aksk_update_total

    Cumulative count of AK/SK update times

    Counter

    log_operator_send_request_total

    Cumulative count of requests that have been sent

    Counter

    log_operator_webhook_listen_status

    Webhook listening status

    Gauge

    log_operator_http_request_duration_seconds

    HTTP request latency

    Histogram

    log_operator_http_request_total

    Cumulative count of HTTP requests

    Counter

    log_operator_webhook_request_total

    Cumulative count of webhook requests

    Counter

  • fluent-bit

    Port: 2020

    Address: /api/v1/metrics/prometheus

    Protocol: HTTP

    Table 8 Metrics

    Metric

    Description

    Type

    fluentbit_filter_add_records_total

    Number of log records that the filter has successfully ingested

    Counter

    fluentbit_filter_drop_records_total

    Number of log records that have been dropped by the filter

    Counter

    fluentbit_input_bytes_total

    Number of bytes of log records that the input instance has successfully ingested

    Counter

    fluentbit_input_files_closed_total

    Total number of files closed by the input instance

    Counter

    fluentbit_input_files_opened_total

    Total number of files opened by the input instance

    Counter

    fluentbit_input_files_rotated_total

    Total number of files rotated by the input instance

    Counter

    fluentbit_input_records_total

    Number of log records the input instance has successfully ingested

    Counter

    fluentbit_output_dropped_records_total

    Number of log records that have been dropped by the output instance

    Counter

    fluentbit_output_errors_total

    Number of chunks that have faced an error

    Counter

    fluentbit_output_proc_bytes_total

    Number of bytes of log records that the output instance has successfully sent

    Counter

    fluentbit_output_proc_records_total

    Number of log records that the output instance has successfully sent

    Counter

    fluentbit_output_retried_records_total

    Number of log records that experienced a retry

    Counter

    fluentbit_output_retries_total

    Number of times the output instance requested a retry for a chunk

    Counter

    fluentbit_uptime

    Number of seconds that Fluent Bit has been running

    Counter

    fluentbit_build_info

    Build and version information of Fluent Bit

    Gauge

  • otel-collector

    Port: 8888

    Address: /metrics

    Protocol: HTTP

    Table 9 Metrics

    Metric

    Description

    Type

    otelcol_exporter_enqueue_failed_log_records

    Number of log records failed to be added to the sending queue

    Counter

    otelcol_exporter_enqueue_failed_metric_points

    Number of metric points failed to be added to the sending queue

    Counter

    otelcol_exporter_enqueue_failed_spans

    Number of spans failed to be added to the sending queue

    Counter

    otelcol_exporter_send_failed_log_records

    Number of log records failed to be sent

    Counter

    otelcol_exporter_sent_log_records

    Number of log records that have been sent

    Counter

    otelcol_process_cpu_seconds

    Total CPU user and system time in seconds

    Counter

    otelcol_process_memory_rss

    Total physical memory (resident set size)

    Gauge

    otelcol_process_runtime_heap_alloc_bytes

    Bytes of allocated heap objects

    Gauge

    otelcol_process_runtime_total_alloc_bytes

    Cumulative bytes allocated for heap objects

    Counter

    otelcol_process_runtime_total_sys_memory_bytes

    Total bytes of memory obtained from the OS

    Gauge

    otelcol_process_uptime

    Uptime of the process in seconds

    Counter

    otelcol_receiver_accepted_log_records

    Number of log records received and processed by the OpenTelemetry receiver

    Counter

    otelcol_receiver_refused_log_records

    Number of log records rejected by the OpenTelemetry receiver

    Counter

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback