Updated on 2024-09-30 GMT+08:00

Overview

Observability is an approach that engineers use to monitor the infrastructure and applications in a cloud native environment with the help of a variety of tools and techniques. By analyzing the collected metrics, logs, and traces, engineers can gain insights into the applications for easier troubleshooting. This section describes the observability architecture of CCE and main observability capabilities.

Figure 1 Observability architecture

The observability architecture consists of four parts: compute base, data collection, monitoring and logging, and O&M.

Compute Base

CCE allows you to create CCE Turbo clusters or CCE standard clusters as required. CCE provides a unified data collection solution for different cluster types, which ensures a consistent experience in cloud native observability. For details about CCE clusters, see CCE Service Overview.

Data Collection

Metric collection: An add-on based on Prometheus is provided for cloud native cluster monitoring. This add-on is much more lightweight and can be used out of the box. For details, see Cloud Native Cluster Monitoring.

Log collection: An add-on based on Fluent Bit and OpenTelemetry is provided for cloud native logging. This add-on features high performance and low resource usage. There are also CRD-based log collection policies, which are more flexible and easy to use. For details, see Cloud Native Logging.

Monitoring and Logging

Application Operations Management (AOM) is a one-stop, multi-dimensional O&M management platform for cloud applications. It monitors applications and related cloud resources in real time, analyzes application health, and provides flexible data visualization functions to help you detect faults in a timely manner.

Log Tank Service (LTS) collects log data from hosts and cloud services. LTS can process a massive number of logs efficiently, securely, and in real time, which enables you to gain insights into cloud services and applications and optimize their availability and performance. It also helps you in real-time decision-making, device O&M management, and service trend analysis.

Cloud Native Observability

CCE provides Health Center, Monitoring Center, Logging, and Alarm Center for cloud native observability.

  • Health Center

    Health diagnosis carefully monitors cluster health by leveraging the experience of our container O&M experts to detect cluster faults and identify risks in a timely manner. It provides rectification suggestions too.

  • Monitoring Center

    Monitoring Center provides functions such as multi-dimensional data insights and dashboard. Monitoring Center provides monitoring views from dimensions such as clusters, nodes, workloads, and pods. It supports multi-level drill-down and association analysis. Dashboard gives you monitoring graphs for items such as the API server, CoreDNS, and PVC.

  • Logging

    CCE works with LTS to collect logs of control plane components (kube-apiserver, kube-controller-manager, and kube-scheduler), Kubernetes audit logs, Kubernetes events, and container logs (stdout logs, text logs, and node logs).

  • Alarm Center

    Alarm Center works with AOM 2.0 to allow you to create alarm rules and view alarms of clusters and containers.

Resource Permissions

Health Center, Monitoring Center, Logging, and Alarm Center work closely with cloud services for cluster monitoring, alarm reporting, and notification. When you access Health Center, Monitoring Center, Logging, or Alarm Center for the first time, the system will request permissions to access the cloud services in the region where you run your applications.

The following table lists the permissions.

Assigned To

Permission

Description

CCE

IAM ReadOnlyAccess

IAM users need to access Monitoring Center and Alarm Center.

CCE

Tenant Guest

Monitoring Center and Alarm Center check the configurations of global resources associated with clusters such as OBS and DNS resources to identify incorrect configurations.

CCE

CCE Administrator

Monitoring Center and Alarm Center need to access CCE to obtain information about clusters, nodes, and workloads so that they can help ensure resource health.

CCE

SWR Administrator

Monitoring Center and Alarm Center need to access SWR to obtain image information.

CCE

SMN Administrator

Monitoring Center and Alarm Center need to access SMN to obtain contact group information.

CCE

AOM Administrator

Monitoring Center and Alarm Center need to access AOM to obtain metrics.

CCE

LTS Administrator

Monitoring Center and Alarm Center need to access LTS to obtain logs.

AOM

DMS UserAccess

AOM obtains subscription data from DMS.

AOM

ECS CommonOperations

AOM obtains system metrics and logs using UniAgents and ICAgents installed on ECSs.

AOM

CES ReadOnlyAccess

AOM synchronizes metrics from Cloud Eye.

AOM

CCE FullAccess

AOM synchronizes container metrics from CCE.

AOM

RMS ReadOnlyAccess

AOM CMDB manages cloud service instance data.

AOM

ECS ReadOnlyAccess

AOM obtains system metrics and logs using UniAgents and ICAgents installed on ECSs.

AOM

LTS FullAccess

AOM obtains logs from LTS.

AOM

CCI FullAccess

AOM synchronizes container metrics from CCI.

After you agree to the authorization, agencies are automatically created in IAM to delegate required resource operation permissions in your account to Huawei Cloud CCE and AOM. For details about agencies, see Cloud Service Delegation. The following are agencies automatically created in IAM:

  • cia_admin_trust

    This agency has the Tenant Guest and IAM ReadOnlyAccess permissions in global projects as well as the Tenant Guest, CCE Administrator, and SWR Administrator permissions in regional projects. These permissions are required by Health Center, Monitoring Center, Logging, or Alarm Center to access other cloud services.

    To use Health Center, Monitoring Center, Logging, or Alarm Center in multiple regions, you need to apply for the Tenant Guest, CCE Administrator, and SWR Administrator permissions in each region. (Go to the IAM console, choose Agencies, and click cia_admin_trust to view the authorization records in each region.)

  • aom_admin_trust

    For details about the aom_admin_trust agency, see AOM Cloud Service Authorization.

Health Center, Monitoring Center, Logging, or Alarm Center may fail to run as expected if the required permissions are not assigned. When using Health Center, Monitoring Center, Logging, or Alarm Center, do not delete or modify cia_admin_trust and aom_admin_trust.