Unified Metric Monitoring

This section describes how to centrally monitor metric data of different accounts.

Scenario

O&M personnel of an e-commerce platform need to monitor metric data of different accounts in real time.

Solution

Create a Prometheus instance for multi-account aggregation and connect accounts, cloud services, and cloud service metrics. On the Metric Browsing page, you can monitor metrics of multiple member accounts and set alarm rules for them. When a metric is abnormal, an alarm is triggered immediately and a notification is sent.

Prerequisites

The monitoring account and the monitored account have been added to an organization. The monitoring account must be an organization administrator. If not, perform step 2 to set a delegated administrator.
For the monitored account, metrics of the following cloud services can be aggregated: FunctionGraph, Elastic Volume Service (EVS), Cloud Backup and Recovery (CBR), Object Storage Service (OBS), Virtual Private Cloud (VPC), Elastic Load Balance (ELB), Direct Connect, NAT Gateway, Distributed Message Service (DMS), Distributed Cache Service (DCS), Relational Database Service (RDS), Document Database Service (DDS), Data Replication Service (DRS), LakeFormation, MapReduce Service (MRS), GaussDB(DWS), Cloud Search Service (CSS), and Web Application Firewall (WAF). Cloud Container Engine (CCE) and Elastic Cloud Server (ECS) metrics collected by ICAgents can also be aggregated.

Step 1: Connecting Cloud Services for a Monitored Account

The following uses FunctionGraph and ECS as examples. The procedure for connecting CCE is similar to that for connecting ECS. However, ICAgents are automatically installed by default when you purchase CCE clusters. The procedure for connecting FunctionGraph is similar to that for connecting other cloud services.

Connecting FunctionGraph
1. Log in to the AOM 2.0 console.
2. In the navigation pane, choose Access Center.
3. Under Cloud Services, click FunctionGraph. In the displayed dialog box, click Connect Now.
Connecting ECS
1. Hover over the username in the upper right corner and choose My Credentials from the drop-down list.
  Figure 1 My credentials
2. On the My Credentials page, click the Access Keys tab.
3. Click Create Access Key and enter a verification code or password.
  Figure 2 Adding an access key
4. Click OK to download the generated AK/SK.
  You can obtain the AK from the access key list and SK from the downloaded CSV file.
5. Return to the AOM 2.0 console page. In the navigation pane, choose Collection Management.
6. In the navigation pane, choose UniAgent > VM Access.
7. On the VM Access page, select the hosts where ICAgents are to be installed and choose Plug-in Batch Operation.
  Figure 3 Installing ICAgents
8. In the displayed dialog box, set Operation to Install, Plug-in to ICAgent, and Version to 5.12.163, and enter the AK/SK obtained in 4.
9. Click OK to install ICAgents.

Step 2: Enable Access for AOM and Set a Delegated Administrator (Skip this Step You Are an Organization Administrator)

Log in to the Organizations console as an administrator.
In the navigation pane, choose Services.
In the service list, locate Application Operations Management (AOM) and click Enable Access in the Operation column.
Click Specify Delegated Administrator in the Operation column of AOM, select the desired account, and click OK. As shown in Figure 4, paas_aom is specified as the delegated administrator.

Figure 4 Specifying a delegated administrator

Step 3: Create an Instance for Multi-Account Aggregation

Log in to the AOM 2.0 console as an administrator or delegated administrator.
In the navigation pane, choose Prometheus Monitoring > Instances. On the displayed page, click Add Prometheus Instance.
Enter an instance name and select the Prometheus for Multi-Account Aggregation instance type.
Click OK. As shown in Figure 5, a multi-account aggregation instance named test-aom is created.

Figure 5 Prometheus instance list
In the Prometheus instance list, click the name of the created instance. On the displayed page, select the accounts, cloud services, and cloud service metrics to connect.

For example, connect member accounts paas_apm and paas_aom. Connect cloud services such as FunctionGraph, DCS, and ECS. Click Add Metric. In the displayed dialog box, select desired metrics.

Figure 6 Connecting accounts

Wait for 2 to 3 minutes and view the connected metric data on the Metric Browsing page.

Step 4: Configuring Unified Monitoring

Check whether the metrics of the created instance are connected.
1. In the navigation pane, choose Metric Browsing. In the Prometheus Instance drop-down list, select instance test-aom created in step 3.
2. Click All metrics, select a metric, and copy the metric name.
3. Click Prometheus statement and enter sum(metric name) by (aom_source_account_name) to check whether the metric is connected.
  Figure 7 Checking metrics
Click All metrics and select the metric to be monitored. As shown in Figure 8, select the aom_node_cpu_usage metric so that its values and trends under the paas_apm and paas_aom accounts can be monitored in real time.

Figure 8 Checking metrics
Click in the upper right corner of the metric list to add an alarm rule for the selected metric.
1. Set the basic information about the alarm rule, such as the rule name.
2. Set the detailed information about the alarm rule.
  1. By default, the rule type, configuration mode, and Prometheus instance in the alarm rule settings are the same as those on the Metric Browsing page.
  2. Set alarm rule details. By default, the metric selected on the Metric Browsing page is automatically displayed.
    You need to set information such as the statistical period, condition, detection rule, trigger condition, and alarm severity. The detection rule consists of the statistical mode (Avg, Min, Max, Sum, and Samples), determination criterion (≥, ≤, >, and <), and threshold value. For example, if Statistical Period is 1 minute, Rule is Avg >1, Consecutive Periods is 3, and Alarm Severity is Critical, a critical alarm will be generated when the average metric value is greater than 1 for three consecutive periods.
    
    Figure 9 Setting an alarm rule
  3. Click Advanced Settings and set information such as Check Interval and Alarm Clearance.
  4. Set an alarm notification policy. There are two alarm notification modes. As shown in Figure 10, the direct alarm reporting mode is selected.
    Direct alarm reporting: An alarm is directly sent when the alarm condition is met. If you select this mode, set an interval for notification and specify whether to enable an action rule.
    1. Set the frequency for sending alarm notifications.
    2. Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template.
    Figure 10 Alarm notification
  5. Click Confirm. Then, click Back to Alarm Rule List to view the created alarm rule.
    As shown in Figure 11, click next to a rule name to view details.
    
    In the expanded list, if a monitored object meets the configured alarm condition, a metric alarm is generated on the alarm page. To view the alarm, choose Alarm Management > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by email, SMS, or WeCom.
    
    Figure 11 Alarm rule
Click in the upper right corner of the metric list to add the graph to the dashboard.
1. Select a dashboard from the drop-down list and enter the graph name. If the dashboards in the list cannot meet your requirements, click Add Dashboard to add one. For details, see Creating a Dashboard.
  Figure 12 Adding the graph to a dashboard
2. Click Confirm. The dashboard page is displayed. As shown in Figure 13, the CPU Usage graph is added to the aom dashboard so that its values and trends under the paas_apm and paas_aom accounts can be monitored in real time.
  Figure 13 Viewing the graph