Enabling Monitoring for On-premises Clusters

This section describes how to enable monitoring for on-premises clusters.

Prerequisites

An on-premises cluster has been registered with UCS. For details, see Overview.

Preparing the Network Environment

There are two options, public network and private network, for data access of an on-premises cluster.

The public network features flexibility, cost-effectiveness, and easy access. If network quality is not a concern and simpler access is preferred, public network access is a good choice.
This option is only available for clusters that can access the public network.
The private network features high speed, low latency, and security. After you connect the on-premises network to the cloud network over Direct Connect or VPN, you can use a VPC endpoint to access CIA over the private network.
Figure 1 Private network access diagram

Before enabling this function, you need to prepare a VPC and connect the network environment of the on-premises data center to the VPC. The VPC subnet CIDR block cannot overlap with the CIDR block used by the on-premises data center. Otherwise, the cluster cannot be connected. For example, if the VPC subnet used by the on-premises data center is 192.168.1.0/24, the subnet 192.168.1.0/24 cannot be used in the Huawei Cloud VPC.

Use either of the following methods to connect the network:
- VPN: See Connecting an On-Premises Data Center to a VPC Through a VPN.
- Direct Connect: See Accessing a VPC over a Single Connection Through Static Routes or Accessing a VPC over a Single Connection Through BGP Routes.

Enabling Monitoring

Log in to the UCS console. In the navigation pane, choose Container Intelligent Analysis.
Select a fleet or a cluster not in the fleet, and click Enable Monitoring.

Figure 2 Selecting a fleet or a cluster not in the fleet
Select an on-premises cluster.
Click Next: Configure Connection to complete the network settings.
- Data Access: Select Public access or Private access.
- Data Reported To: Select the region where data is reported. The region must be the same as that of the VPC connected to the on-premises cloud network.
- Projects: If the IAM project function is enabled, you also need to select a project.
- Network Settings: This area is mandatory when Data Access is set to Private access.
  VPC Endpoint: You can select an existing VPC endpoint or create a VPC endpoint.
  
  When you create a VPC endpoint in the VPC that has been connected to the on-premises network to connect to the data receiving point of CIA, you can select an existing private network access point. If you create a private network access point, you will be charged ¥0.1/hour for the VPC endpoint.
  
  When you create a private network access point, a VPC endpoint and a DNS private domain name will be generated. Ensure that the Huawei Cloud account has corresponding resource quotas. In addition, ensure that the subnet selected on the page has available IP addresses.
Complete metric collection settings.

Specifications
- Deployment Mode: The Agent and Server modes are supported. The add-on deployed in Agent mode occupies fewer cluster resources and provides Prometheus metric collection for clusters. However, it does not support the HPA and health diagnosis functions based on custom Prometheus statements. The add-on deployed in Server mode provides Prometheus metric collection for clusters and supports the HPA and health diagnosis functions based on custom Prometheus statements. This mode depends on the PVC and consumes a large amount of memory.
- Add-on Specifications: If Deployment Mode is set to Agent, the default add-on specifications are used. If Deployment Mode is set to Server, the add-on specifications include Demo (≤ 100 containers), Small (≤ 2,000 containers), Medium (≤ 5,000 containers), and Large (> 5,000 containers). Different specifications have different requirements on cluster resources, such as CPUs and memory. For details about the resource quotas of different add-on specifications, see Resource Quota Requirements of Different Specifications..
Parameters
- Interconnection Mode: Currently, only AOM can be interconnected.
- AOM Instance: Container monitoring reports metrics to AOM in a unified manner. You need to select an AOM instance of the Prometheus for CCE type. The default metrics are collected for free but custom metrics are billed by AOM. For details, see AOM Billing.
- Collection Period: period for Prometheus to collect and report metrics. The value ranges from 10 to 60 seconds. The default value is 15 seconds.
- Storage: used to temporarily store Prometheus data. This parameter is mandatory when Deployment Mode is set to Server. The on-premises cluster supports the CSI-Local storage type. A local volume represents a local disk of a node that is provided to a pod through a PVC. With local volumes, a pod using a local volume is always scheduled to the same node. Ensure that the scheduling policy of the pod does not conflict with that of the target node.
  - Storage Type: Select CSI-Local.
  - Capacity: capacity specified when the PVC is created. This capacity is for reference only. The actual capacity is the available capacity of the disk where the local directory is located.
  - Node: node to which Prometheus will be scheduled. Ensure that Prometheus can be scheduled to this node.
  - Node Path: directory for storing data on Prometheus. Enter an absolute path. The path will be automatically created on the target node.
For details about the add-on, see kube-prometheus-stack.
Click Confirm. The Clusters tab (Container Insights > Clusters) is displayed. The access status of the cluster is Installing.

After monitoring is enabled for the cluster, metrics such as the CPU usage and CPU allocation rate of the cluster are displayed in the list, indicating that the cluster is monitored by CIA.

If monitoring fails to be enabled for the cluster, rectify the fault by referring to FAQs.