- What's New
- Function Overview
- Service Overview (2.0)
- Billing (2.0)
- Getting Started (2.0)
-
User Guide (2.0)
- Introduction
- Access Center
- Dashboard
- Alarm Management
- Metric Browsing
- Log Analysis (New)
- Log Analysis (Old)
- Application Insights (Retiring)
-
Prometheus Monitoring
- Prometheus Monitoring
- Creating Prometheus Instances
- Managing Prometheus Instances
- Configuring a Recording Rule
- Metric Management
- Dashboard Monitoring
- Data Multi-Write
-
Access Guide
- Connecting Node Exporter
- Connecting Self-Built Middleware in the CCE Container Scenario
-
Exporter Access in the VM Scenario
- Access Overview
- MySQL Component Access
- Redis Component Access
- Kafka Component Access
- Nginx Component Access
- MongoDB Component Access
- Consul Component Access
- HAProxy Component Access
- PostgreSQL Component Access
- Elasticsearch Component Access
- RabbitMQ Component Access
- Access of Other Components
- Custom Plug-in Access
- Other Operations
- Obtaining the Service Address of a Prometheus Instance
- Regions that Support Public Network Addresses for Remote Write
- Viewing Prometheus Instance Data Through Grafana
- Reading Prometheus Instance Data Through Remote Read
- Reporting Self-Built Prometheus Instance Data to AOM
- Resource Usage Statistics
- Business Monitoring (Beta)
- Infrastructure Monitoring
- O&M Management (Retiring)
- Settings
- Remarks
- Permissions Management
- Auditing
- Subscribing to AOM 2.0
- Upgrading to AOM 2.0
-
Best Practices (2.0)
- AOM Best Practices
- Building a Comprehensive Metric System
- Alarm Noise Reduction
- Unified Metric Monitoring
- Customizing OS Images to Automatically Connect UniAgent
- Connecting Self-Built Middleware in the CCE Container Scenario
- Interconnecting Third-Party/IDC/Huawei Cloud Cross-Region Self-Built Prometheus with AOM Prometheus Instances
-
FAQs (2.0)
- Dashboard
- Alarm Management
- Log Analysis
- Prometheus Monitoring
- Infrastructure Monitoring
- Application Monitoring
-
Collection Management
- Are ICAgent and UniAgent the Same?
- What Can I Do If an ICAgent Is Offline?
- Why Is an Installed ICAgent Displayed as "Abnormal" on the UniAgent Installation and Configuration Page?
- Why Can't I View the ICAgent Status After It Is Installed?
- Why Can't AOM Monitor CPU and Memory Usage After ICAgent Is Installed?
- How Do I Obtain an AK/SK?
- FAQs About UniAgent and ICAgent Installation
- How Do I Enable the Nginx stub_status Module?
- Why Does APM Metric Collection Fail?
- Why Cannot the Installation Script Be Downloaded When I Try to Install UniAgent on an ECS?
- CMDB (Unavailable Soon)
-
O&M Management (Unavailable Soon)
- How Can I Obtain the OBS Permission for Installing Packages?
- Why Can't Scheduled Tasks Be Triggered?
- Can I Specify Script Parameters and Hosts During Job Execution?
- Why Is a Parameter Error Displayed When I Create a Scheduled Task Using a Cron Expression?
- How Can I Set a Review for an Execution Plan?
- Why Is "delete success:{}" Displayed (Files Cannot Be Deleted) During Disk Clearance?
- What Can I Do If the Execution Plan Is Not Updated After I Modify the Job?
- What Can I Do If "agent not found" Is Displayed?
- Why Are the Hosts Listed in Execution Logs Inconsistent with Those I Configured for a Task?
- Why Did a Task Fail to Execute?
- Other FAQs
-
API Reference
- Before You Start
- API Overview
- Calling APIs
-
APIs
-
Alarm
- Querying the Event Alarm Rule List
- Adding an Event Alarm Rule
- Modifying an Event Alarm Rule
- Deleting an Event Alarm Rule
- Querying Events and Alarms
- Counting Events and Alarms
- Reporting Events and Alarms
- Obtaining the Alarm Sending Result
- Deleting a Silence Rule
- Adding a Silence Rule
- Modifying a Silence Rule
- Obtaining the Silence Rule List
- Querying an Alarm Action Rule Based on Rule Name
- Adding an Alarm Action Rule
- Deleting an Alarm Action Rule
- Modifying an Alarm Action Rule
- Querying the Alarm Action Rule List
- Querying Metric or Event Alarm Rules
- Adding or Modifying Metric or Event Alarm Rules
- Deleting Metric or Event Alarm Rules
-
Monitoring
- Querying Time Series Objects
- Querying Time Series Data
- Querying Metrics
- Querying Monitoring Data
- Adding Monitoring Data
- Adding or Modifying One or More Service Discovery Rules
- Deleting a Service Discovery Rule
- Querying Existing Service Discovery Rules
- Adding a Threshold Rule
- Querying the Threshold Rule List
- Modifying a Threshold Rule
- Deleting a Threshold Rule
- Querying a Threshold Rule
- Deleting Threshold Rules in Batches
-
Prometheus Monitoring
- Querying Expression Calculation Results in a Specified Period Using the GET Method
- (Recommended) Querying Expression Calculation Results in a Specified Period Using the POST Method
- Querying the Expression Calculation Result at a Specified Time Point Using the GET Method
- (Recommended) Querying Expression Calculation Results at a Specified Time Point Using the POST Method
- Querying Tag Values
- Obtaining the Tag Name List Using the GET Method
- (Recommended) Obtaining the Tag Name List Using the POST Method
- Querying Metadata
- Log
- Prometheus Instance
- Configuration Management
-
CMDB (AOM 2.0)
- Creating an Application
- Deleting an Application
- Querying the Details of an Application
- Modifying an Application
- Adding a Component
- Deleting a Component
- Querying the Details of a Component
- Modifying a Component
- Creating an Environment
- Deleting an Environment
- Querying the Details of an Environment
- Modifying an Environment
- Querying the Resource List of a Node
- Querying the Details of an Application Based on the Application Name
- Querying the Details of an Environment Based on the Environment Name
- Querying the Details of a Component Based on the Component Name
- Adding a Sub-application
- Deleting a Sub-application
- Modifying a Sub-application
-
Automation (AOM 2.0)
- Creating a Task
- Updating a Task
- Operating a Paused Task
- Obtaining the Execution Details of a Workflow
- Terminating a Task
- Querying a Script
- Querying the Script Version
- Performing Fuzzy Search on the Job Management Page
- Querying Execution Plans (Custom Templates) Based on Job ID
- Querying the Details of an Execution Plan
- Querying Tasks
- Querying the Execution History of a Task
- Executing a Workflow
-
Alarm
- Historical APIs
- Examples
- Permissions Policies and Supported Actions
- Appendix
- SDK Reference
-
Service Overview (1.0)
- What Is AOM?
- Product Architecture
- Functions
- Application Scenarios
- Edition Differences
-
Metric Overview
- Introduction
- Network Metrics and Dimensions
- Disk Metrics and Dimensions
- Disk Partition Metrics
- File System Metrics and Dimensions
- Host Metrics and Dimensions
- Cluster Metrics and Dimensions
- Container Metrics and Dimensions
- VM Metrics and Dimensions
- Instance Metrics and Dimensions
- Service Metrics and Dimensions
- Security
- Restrictions
- Privacy and Sensitive Information Protection Statement
- Relationships Between AOM and Other Services
- Basic Concepts
- Permissions
- Billing
- Change History
- Getting Started (1.0)
-
User Guide (1.0)
- Overview
- Subscribing to AOM
- Permissions Management
- Connecting Resources to AOM
- Monitoring Overview
- Alarm Management
- Resource Monitoring
- Log Management
- Configuration Management
- Resource Groups
- Auditing
- Upgrading to AOM 2.0
- Best Practices (1.0)
-
FAQs (1.0)
- User FAQs
-
Consultation FAQs
- What Is the Billing Policy of AOM?
- What Are the Usage Restrictions of AOM?
- What Are the Differences Between AOM and APM?
- How Do I Distinguish Alarms from Events?
- What Is the Relationship Between the Time Range and Statistical Cycle?
- Does AOM Display Logs in Real Time?
- Will Container Logs Be Deleted After They Are Dumped?
- How Can I Do If I Cannot Receive Any Email Notification After Configuring a Threshold Rule?
- Why Are Connection Channels Required?
-
Usage FAQs
- What Can I Do If I Do Not Have the Permission to Access SMN?
- What Can I Do If Resources Are Not Running Properly?
- How Do I Set the Full-Screen Online Duration?
- What Can I Do If the Log Usage Reaches 90% or Is Full?
- How Do I Obtain an AK/SK?
- How Can I Check Whether a Service Is Available?
- Why Is the Status of an Alarm Rule Displayed as "Insufficient"?
- Why the Status of a Workload that Runs Normally Is Displayed as "Abnormal" on the AOM Page?
- How Do I Create the apm_admin_trust Agency?
- How Do I Obtain the AK/SK by Creating an Agency?
- What Is the Billing Policy of Logs?
- Why Can't I See Any Logs on the Console?
- What Can I Do If an ICAgent Is Offline?
- Why Can't the Host Be Monitored After ICAgent Is Installed?
- Why Is "no crontab for root" Displayed During ICAgent Installation?
- Why Can't I Select an OBS Bucket When Configuring Log Dumping on AOM?
- Why Can't Grafana Display Content?
- Videos
-
More Documents
-
User Guide (1.0) (Kuala Lumpur Region)
-
Service Overview
- What Is AOM?
- Product Architecture
- Functions
- Application Scenarios
-
Metric Overview
- Introduction
- Network Metrics and Dimensions
- Disk Metrics and Dimensions
- Disk Partition Metrics
- File System Metrics and Dimensions
- Host Metrics and Dimensions
- Cluster Metrics and Dimensions
- Container Metrics and Dimensions
- VM Metrics and Dimensions
- Instance Metrics and Dimensions
- Service Metrics and Dimensions
- Restrictions
- Privacy and Sensitive Information Protection Statement
- Relationships Between AOM and Other Services
- Basic Concepts
- Permissions
- Getting Started
- Permissions Management
- Connecting Resources to AOM
- Monitoring Overview
- Alarm Management
- Resource Monitoring
- Log Management
- Configuration Management
- Auditing
- Upgrading to AOM 2.0
-
FAQs
- User FAQs
-
Consultation FAQs
- What Are the Usage Restrictions of AOM?
- What Are the Differences Between AOM and APM?
- How Do I Distinguish Alarms from Events?
- What Is the Relationship Between the Time Range and Statistical Cycle?
- Does AOM Display Logs in Real Time?
- How Can I Do If I Cannot Receive Any Email Notification After Configuring a Threshold Rule?
- Why Are Connection Channels Required?
-
Usage FAQs
- What Can I Do If I Do Not Have the Permission to Access SMN?
- What Can I Do If Resources Are Not Running Properly?
- How Do I Set the Full-Screen Online Duration?
- How Do I Obtain an AK/SK?
- How Can I Check Whether a Service Is Available?
- Why Is the Status of an Alarm Rule Displayed as "Insufficient"?
- Why the Status of a Workload that Runs Normally Is Displayed as "Abnormal" on the AOM Page?
- How Do I Create the apm_admin_trust Agency?
- What Can I Do If an ICAgent Is Offline?
- Why Is "no crontab for root" Displayed During ICAgent Installation?
- Change History
-
Service Overview
-
User Guide (2.0) (Kuala Lumpur Region)
- Service Overview
- Getting Started
- Introduction
- Access Center
- Dashboard
- Alarm Management
-
Metric Analysis
- Metric Browsing
- Prometheus Monitoring
- Resource Usage Statistics
- Log Analysis (Beta)
- Container Insights
- Infrastructure Monitoring
- Process Monitoring
- Collection Management
- Configuration Management
- Remarks
- Permissions Management
- Auditing
- Upgrading to AOM 2.0
-
FAQs
- Overview
- Dashboard
- Alarm Management
- Log Analysis
- Prometheus Monitoring
- Container Insights
- Application Monitoring
-
Collection Management
- Are ICAgent and UniAgent the Same?
- What Can I Do If an ICAgent Is Offline?
- Why Is an Installed ICAgent Displayed as "Abnormal" on the Agent Management Page?
- Why Can't I View the ICAgent Status After It Is Installed?
- Why Can't AOM Monitor CPU and Memory Usage After ICAgent Is Installed?
- How Do I Obtain an AK/SK?
- FAQs About ICAgent Installation
- Other FAQs
- Change History
-
API Reference (Kuala Lumpur Region)
- Before You Start
- API Overview
- Calling APIs
-
APIs
-
Alarm
- Querying the Event Alarm Rule List
- Adding an Event Alarm Rule
- Modifying an Event Alarm Rule
- Deleting an Event Alarm Rule
- Obtaining the Alarm Sending Result
- Deleting a Silence Rule
- Adding a Silence Rule
- Modifying a Silence Rule
- Obtaining the Silence Rule List
- Querying an Alarm Action Rule Based on Rule Name
- Adding an Alarm Action Rule
- Deleting an Alarm Action Rule
- Modifying an Alarm Action Rule
- Querying the Alarm Action Rule List
- Querying Events and Alarms
- Counting Events and Alarms
- Reporting Events and Alarms
-
Monitoring
- Querying Time Series Objects
- Querying Time Series Data
- Querying Metrics
- Querying Monitoring Data
- Adding Monitoring Data
- Adding or Modifying One or More Service Discovery Rules
- Deleting a Service Discovery Rule
- Querying Existing Service Discovery Rules
- Adding a Threshold Rule
- Querying the Threshold Rule List
- Modifying a Threshold Rule
- Deleting a Threshold Rule
- Querying a Threshold Rule
- Deleting Threshold Rules in Batches
- Log
-
Alarm
- Examples
- Permissions Policies and Supported Actions
- Appendix
- Change History
-
User Guide (ME-Abu Dhabi Region)
- Service Overview
- Getting Started
- User Guide
-
FAQs
- What Can I Do If an ICAgent Is Offline?
- Obtaining an AK/SK
- What Is the Relationship Between the Time Range and Statistical Cycle?
- What Can I Do If Resources Are Not Running Properly?
- How Can I Do If I Do Not Have the Permission to Access SMN?
- How Do I Distinguish Alarms and Events?
- Does AOM Display Logs in Real Time?
- How Can I Check Whether a Service Is Available?
- Why Is the Status of an Alarm Rule Displayed as "Insufficient"?
- Why the Status of a Workload that Runs Normally Is Abnormal on the AOM Page?
-
API Reference(ME-Abu Dhabi Region)
- Before You Start
- API Overview
- Calling APIs
-
APIs
-
Monitoring (v1)
- Querying Metrics
- Querying Monitoring Data
- Adding Monitoring Data
- Adding a Threshold Rule
- Modifying a Threshold Rule
- Querying the Threshold Rule List
- Querying a Threshold Rule
- Deleting a Threshold Rule
- Adding or Modifying One or More Application Discovery Rules
- Deleting an Application Discovery Rule
- Querying Application Discovery Rules
- Auto Scaling
- Log
-
Monitoring (v1)
- Permissions Policies and Supported Actions
- Appendix
-
User Guide (Ankara Region)
- Service Overview
- Getting Started
- User Guide
-
FAQs
- What Can I Do If an ICAgent Is Offline?
- How Do I Obtain an AK/SK?
- What Can I Do If Resources Are Not Running Properly?
- How Can I Do If I Do Not Have the Permission to Access SMN?
- How Do I Distinguish Alarms from Events?
- Does AOM Display Logs in Real Time?
- Why Is the Application Status Normal but the Component Status Abnormal?
- Best Practices
- Change History
-
API Reference (Ankara Region)
- Before You Start
- API Overview
- Calling APIs
-
APIs
-
Monitoring (v1)
- Querying Metrics
- Querying Monitoring Data
- Adding Monitoring Data
- Adding a Threshold Rule
- Modifying a Threshold Rule
- Querying the Threshold Rule List
- Querying a Threshold Rule
- Deleting a Threshold Rule
- Adding or Modifying One or More Application Discovery Rules
- Deleting an Application Discovery Rule
- Querying Application Discovery Rules
- Monitoring (v2)
- Auto Scaling
- Log
- Events/Alarms
- Agent
- Application Discovery Rules
-
Prometheus Monitoring
- Querying Expression Calculation Results in a Specified Period
- Querying the Expression Calculation Result at a Specified Time Point
- Querying Tag Values
- Obtaining the Tag Name List
- Querying Metadata
- Querying the Calculation Results of a PromQL Expression in a Specified Period Based on Prometheus Instance
- Querying the Calculation Result of a PromQL Expression at a Specified Time Point Based on Prometheus Instance
- Querying the Values of a Tag Based on Prometheus Instance
- Obtaining the Tag Name List Based on Prometheus Instance
- Querying Metadata Based on Prometheus Instance
-
Monitoring (v1)
- Appendix
- Change History
-
User Guide (1.0) (Kuala Lumpur Region)
- General Reference
- Scenario
- Solution
- Prerequisites
- Step 1: Connecting Cloud Services for a Monitored Account
- Step 2: Enable Access for AOM and Set a Delegated Administrator (Skip this Step You Are an Organization Administrator)
- Step 3: Create an Instance for Multi-Account Aggregation
- Step 4: Configuring Unified Monitoring
Show all
Copied.
Unified Metric Monitoring
This section describes how to centrally monitor metric data of different accounts.
Scenario
O&M personnel of an e-commerce platform need to monitor metric data of different accounts in real time.
Solution
Create a Prometheus instance for multi-account aggregation and connect accounts, cloud services, and cloud service metrics. On the Metric Browsing page, you can monitor metrics of multiple member accounts and set alarm rules for them. When a metric is abnormal, an alarm is triggered immediately and a notification is sent.
Prerequisites
- The monitoring account and the monitored account have been added to an organization. The monitoring account must be an organization administrator. If not, perform step 2 to set a delegated administrator.
- For the monitored account, metrics of the following cloud services can be aggregated: FunctionGraph, Elastic Volume Service (EVS), Cloud Backup and Recovery (CBR), Object Storage Service (OBS), Virtual Private Cloud (VPC), Elastic Load Balance (ELB), Direct Connect, NAT Gateway, Distributed Message Service (DMS), Distributed Cache Service (DCS), Relational Database Service (RDS), Document Database Service (DDS), Data Replication Service (DRS), LakeFormation, MapReduce Service (MRS), GaussDB(DWS), Cloud Search Service (CSS), and Web Application Firewall (WAF). Cloud Container Engine (CCE) and Elastic Cloud Server (ECS) metrics collected by ICAgents can also be aggregated.
Step 1: Connecting Cloud Services for a Monitored Account
The following uses FunctionGraph and ECS as examples. The procedure for connecting CCE is similar to that for connecting ECS. However, ICAgents are automatically installed by default when you purchase CCE clusters. The procedure for connecting FunctionGraph is similar to that for connecting other cloud services.
- Connecting FunctionGraph
- Log in to the AOM 2.0 console.
- In the navigation pane, choose Access Center.
- Under Cloud Services, click FunctionGraph. In the displayed dialog box, click Connect Now.
- Connecting ECS
- Hover over the username in the upper right corner and choose My Credentials from the drop-down list.
Figure 1 My credentials
- On the My Credentials page, click the Access Keys tab.
- Click Create Access Key and enter a verification code or password.
Figure 2 Adding an access key
- Click OK to download the generated AK/SK.
You can obtain the AK from the access key list and SK from the downloaded CSV file.
- Return to the AOM 2.0 console page. In the navigation pane, choose Collection Management.
- In the navigation pane, choose UniAgent > VM Access.
- On the VM Access page, select the hosts where ICAgents are to be installed and choose Plug-in Batch Operation.
Figure 3 Installing ICAgents
- In the displayed dialog box, set Operation to Install, Plug-in to ICAgent, and Version to 5.12.163, and enter the AK/SK obtained in 4.
- Click OK to install ICAgents.
- Hover over the username in the upper right corner and choose My Credentials from the drop-down list.
Step 2: Enable Access for AOM and Set a Delegated Administrator (Skip this Step You Are an Organization Administrator)
- Log in to the Organizations console as an administrator.
- In the navigation pane, choose Services.
- In the service list, locate Application Operations Management (AOM) and click Enable Access in the Operation column.
- Click Specify Delegated Administrator in the Operation column of AOM, select the desired account, and click OK. As shown in Figure 4, paas_aom is specified as the delegated administrator.
Step 3: Create an Instance for Multi-Account Aggregation
- Log in to the AOM 2.0 console as an administrator or delegated administrator.
- In the navigation pane, choose Prometheus Monitoring > Instances. On the displayed page, click Add Prometheus Instance.
- Enter an instance name and select the Prometheus for Multi-Account Aggregation instance type.
- Click OK. As shown in Figure 5, a multi-account aggregation instance named test-aom is created.
- In the Prometheus instance list, click the name of the created instance. On the displayed page, select the accounts, cloud services, and cloud service metrics to connect.
For example, connect member accounts paas_apm and paas_aom. Connect cloud services such as FunctionGraph, DCS, and ECS. Click Add Metric. In the displayed dialog box, select desired metrics.
Figure 6 Connecting accountsWait for 2 to 3 minutes and view the connected metric data on the Metric Browsing page.
Step 4: Configuring Unified Monitoring
- Check whether the metrics of the created instance are connected.
- In the navigation pane, choose Metric Browsing. In the Prometheus Instance drop-down list, select instance test-aom created in step 3.
- Click All metrics, select a metric, and copy the metric name.
- Click Prometheus statement and enter sum(metric name) by (aom_source_account_name) to check whether the metric is connected.
Figure 7 Checking metrics
- Click All metrics and select the metric to be monitored. As shown in Figure 8, select the aom_node_cpu_usage metric so that its values and trends under the paas_apm and paas_aom accounts can be monitored in real time.
- Click
in the upper right corner of the metric list to add an alarm rule for the selected metric.
- Set the basic information about the alarm rule, such as the rule name.
- Set the detailed information about the alarm rule.
- By default, the rule type, configuration mode, and Prometheus instance in the alarm rule settings are the same as those on the Metric Browsing page.
- Set alarm rule details. By default, the metric selected on the Metric Browsing page is automatically displayed.
You need to set information such as the statistical period, condition, detection rule, trigger condition, and alarm severity. The detection rule consists of the statistical mode (Avg, Min, Max, Sum, and Samples), determination criterion (≥, ≤, >, and <), and threshold value. For example, if Statistical Period is 1 minute, Rule is Avg >1, Consecutive Periods is 3, and Alarm Severity is Critical, a critical alarm will be generated when the average metric value is greater than 1 for three consecutive periods.
Figure 9 Setting an alarm rule - Click Advanced Settings and set information such as Check Interval and Alarm Clearance.
- Set an alarm notification policy. There are two alarm notification modes. As shown in Figure 10, the direct alarm reporting mode is selected.
Direct alarm reporting: An alarm is directly sent when the alarm condition is met. If you select this mode, set an interval for notification and specify whether to enable an action rule.
- Set the frequency for sending alarm notifications.
- Specify whether to enable an alarm action rule. After an alarm action rule is enabled, the system sends notifications based on the associated SMN topic and message template.
- Click Confirm. Then, click Back to Alarm Rule List to view the created alarm rule.
As shown in Figure 11, click
next to a rule name to view details.
In the expanded list, if a monitored object meets the configured alarm condition, a metric alarm is generated on the alarm page. To view the alarm, choose Alarm Management > Alarm List in the navigation pane. If a host meets the preset notification policy, the system sends an alarm notification to the specified personnel by email, SMS, or WeCom.
- Click
in the upper right corner of the metric list to add the graph to the dashboard.
- Select a dashboard from the drop-down list and enter the graph name. If the dashboards in the list cannot meet your requirements, click Add Dashboard to add one. For details, see Creating a Dashboard.
Figure 12 Adding the graph to a dashboard
- Click Confirm. The dashboard page is displayed. As shown in Figure 13, the CPU Usage graph is added to the aom dashboard so that its values and trends under the paas_apm and paas_aom accounts can be monitored in real time.
- Select a dashboard from the drop-down list and enter the graph name. If the dashboards in the list cannot meet your requirements, click Add Dashboard to add one. For details, see Creating a Dashboard.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot