Building a Comprehensive Metric System

This section describes how to build a metric system and a dashboard for all-round, multi-dimensional, and visualized monitoring of resources and applications.

Scenario

In the Internet era, user experience is the top priority. The page response speed, access latency, and access success rate often affect user experience. If such information cannot be obtained in a timely manner, a large number of users will be lost. O&M personnel of an online shopping mall used open-source software to collect metrics. However, these metrics are scattered and cannot be displayed centrally.

Solution

AOM implements one-stop, multi-dimensional O&M for cloud applications. In the access center, connect metrics of businesses, applications, middleware, and infrastructure. You can also customize dashboards for monitoring and set alarm rules through a unified entry to implement routine inspection and ensure normal service running.

AOM monitors metrics from multiple dimensions in different scenarios. It has a multi-layer (infrastructure, middleware, application, and business) metric system, displaying more than 1,000 types of metrics.

**Table 1** Four-layer metric system
Category	Source	Example	How to Access
Business metrics	Device log SDKs and extracted ELB logs	UV, PV, latency, access failure rate, and access traffic	Connect Business Metrics
Business metrics	Transaction monitoring or reported custom metrics	URL calls, maximum concurrency, and maximum response time	Connect Business Metrics
Application metrics	Component performance graphs or API performance data	URL calls, average latency, error calls, and throughput	Connect Application Metrics
Middleware metrics	Native or cloud middleware data	File system capacity and file system usage	Connect Middleware Metrics
Infrastructure metrics	Container or cloud service data, such as compute, storage, network, and database data	CPU usage, memory usage, and health status	Connect Infrastructure Metrics Connect Container Metrics Connect Cloud Service Metrics

Prerequisites

ELB logs have been ingested to LTS.
An ECS has been bound to the environment.

Step 1: Build a Four-layer Metric System

Connect business metrics.
1. Log in to the AOM 2.0 console.
2. In the navigation pane, choose Access Center.
3. In the Business panel on the right, click a target card.
  - Connecting ELB log metrics
    1. The system can automatically connect the log metrics.
    2. Choose Dashboard in the navigation pane, select the created dashboard, and click in the upper right corner of the page. On the Log Sources tab, enter the corresponding SQL statement to check the log metrics. For example, to check traffic metrics, enter an SQL statement and click Search.
  - Connecting APM transaction metrics
    1. Install an APM probe for the workload. For details, see Installing an APM Probe.
    2. After the installation is complete, log in to the console of the service where the probe is installed and trigger the collection of APM transaction metrics. In the example of an online shopping mall, you can add a product to the shopping cart to trigger the collection.
    3. Log in to the AOM 2.0 console.
    4. In the navigation pane, choose Metric Browsing. In the right pane, select the connected APM metrics to view.
Connect application metrics.
1. To install an APM probe for a workload, perform the following steps:
  1. Log in to the CCE console and click a target cluster.
  2. Choose Workloads in the navigation pane, and select the type of workload whose metrics are to be reported to AOM.
  3. Click a target workload. On the APM Settings tab page, click Edit in the lower right corner.
  4. Select the APM 2.0 probe, set Probe Version to latest-x86, set APM Environment to phoenixenv1, and select the created application phoenixapp1 from the APM App drop-down list.
  5. Click Save.
2. After the installation is complete, log in to the console of the service where the probe is installed and trigger the collection of application metrics. In the example of an online shopping mall, you can add a product to the shopping cart to trigger the collection.
3. Log in to the AOM 2.0 console.
4. In the navigation pane, choose Metric Browsing. In the right pane, select the connected application metrics to view.
Connect middleware metrics.
1. Upload the data to the ECS.
  1. Download the mysqld_exporter-0.14.0.linux-amd64.tar.gz package from https://prometheus.io/download/.
  2. Log in to the ECS as the root user, upload the Exporter software package to the ECS, and decompress it.
  3. Log in to the RDS console. On the Instances page, click an RDS DB instance name in the instance list. On the basic information page, view the RDS security group.
  4. Check whether port 3306 is enabled in the RDS security group.
    Figure 1 Checking whether the RDS port is enabled
  5. Go to the decompressed folder and configure the mysql.cnf file on the ECS:
```
cd mysqld_exporter-0.14.0.linux-amd64 
vi mysql.cnf
```
    For example, add the following content to the mysql.cnf file:
    
    [client]
    
    user=root (RDS username)
    
    password=**** (RDS password)
    
    host=192.168.0.198 (RDS public IP address)
    
    port=3306 (port)
  6. Run the following command to start the mysqld_exporter tool:
```
nohup ./mysqld_exporter --config.my-cnf="mysql.cnf" --collect.global_status --collect.global_variables &
```
  7. Run the following command to check whether the tool is started properly:
```
curl http://127.0.0.1:9104/metrics
```
    If the command output shown in Figure 2 is displayed, the tool is started properly.
    
    Figure 2 Checking metrics
2. Connect middleware metrics using VM access mode.
  1. Log in to the AOM 2.0 console.
  2. On the VM Access page, install the UniAgent for the ECS. For details, see Manual Installation.
  3. In the navigation pane, choose Access Center. In the Prometheus Middleware panel on the right, click a target card.
  4. In the dialog box that is displayed, configure a collection task and install Exporter. For details, see Exporter Access in the VM Scenario.
  5. Click Create.
3. After the connection is complete, choose Metric Browsing in the navigation pane on the left. In the right pane, view the connected middleware metrics.
Connect infrastructure metrics.
1. Log in to the AOM 2.0 console.
2. In the navigation pane, choose Access Center.
3. In the Prometheus Running Environments or Prometheus Cloud Services panel, click a target card.
  - Select a container metric card:
    For example, if you select the CCE card, the ICAgent is installed by default after you purchase a CCE cluster.
  - Select a cloud service metric card:
    1. In the displayed dialog box, select the desired cloud service to monitor. For example, RDS or DCS.
    2. Click Confirm.
      After the connection is complete, the Cloud Service Monitoring page is displayed. You can view the information (such as running status) of the selected cloud service.
4. After the connection is complete, choose Metric Browsing in the navigation pane on the left. In the right pane, select the connected infrastructure metrics to view.

Step 2: Add a Dashboard for Unified Monitoring

Create a metric alarm rule.

You can set threshold conditions in metric alarm rules for resource metrics. If a metric value meets the threshold condition, a threshold alarm will be generated. If no metric data is reported, an insufficient data event will be generated.

Metric alarm rules can be created in the following modes: Select from all metrics and PromQL. The following uses Select from all metrics as an example.
1. Log in to the AOM 2.0 console.
2. In the navigation pane, choose Alarm Management > Alarm Rules.
3. On the Metric/Event Alarm Rules tab page, click Create.
4. Set the basic information about the alarm rule, such as the rule name.
5. Set parameters about the alarm rule. Set Rule Type to Metric alarm rule and Configuration Mode to Select from all metrics, and select a Prometheus instance from the drop-down list.
6. Set alarm rule details.
  You need to set information such as the statistical period, condition, detection rule, trigger condition, and alarm severity. The detection rule consists of the statistical mode (Avg, Min, Max, Sum, and Samples), determination criterion (≥, ≤, >, and <), and threshold value. For example, if Statistical Period is 1 minute, Rule is Avg >1, Consecutive Periods is 3, and Alarm Severity is Critical, a critical alarm will be generated when the average metric value is greater than 1 for three consecutive periods.
7. Click Advanced Settings and set information such as Check Interval and Alarm Clearance.
8. Set an alarm notification policy. There are two alarm notification modes. As shown in Figure 3, the direct alarm reporting mode is selected.
  Direct alarm reporting: An alarm is directly sent when the alarm condition is met. If you select this mode, set an interval for notification and specify whether to enable an action rule.
  1. Set the frequency for sending alarm notifications.
  2. Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template.
  Figure 3 Alarm notification
9. Click Confirm. Then, click Back to Alarm Rule List to view the created alarm rule.
  As shown in Figure 4, click next to a rule name to view details.
  
  In the expanded list, if a monitored object meets the configured alarm condition, a metric alarm is generated on the alarm page. To view the alarm, choose Alarm Management > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by email, SMS, or WeCom.
  
  Figure 4 Alarm rule

Create a dashboard.

Create a dashboard.
1. Log in to the AOM 2.0 console.
2. In the navigation pane, choose Dashboard.
3. Click Add Dashboard in the upper left corner of the list.
4. In the displayed dialog box, set parameters.
  Bind the dashboard to the created application so that you can monitor key metrics of the application on the Application Monitoring page.
  Figure 5 Creating a dashboard
5. Click OK.

Add a graph to the dashboard.

In the dashboard list, click the created dashboard.

Go to the target dashboard page and click

in the upper right corner to add a graph to the dashboard. Select a proper graph as required.

**Table 2** Adding a graph
Graph Type	Data Source	Scenario
Metric graph	Metric data	Monitors the metrics about the business layer, application layer, and Prometheus middleware.
Log graph	Log data	Monitors business metrics or other log metrics, such as key metrics (latency, throughput, and errors) cleaned based on ELB logs.

The following describes how to add a metric graph for CPU usage and a log graph for latency.

Add a metric graph for CPU usage.
Select the CPU Usage metric. After the setting is complete, the metric graph shown in Figure 6 is displayed.

Figure 6 Adding a metric graph
Add a log graph for latency. Click the Log Sources tab and set parameters to add a log graph.
You can directly obtain the SQL query statement from the graph.
1. In the upper right corner of the graph display area, click Show Chart.
2. In the Charts list, select required log metrics to monitor.
3. The query statement corresponding to the metric is automatically filled in the SQL statement setting area.
After setting the parameters, click Add to Dashboard.