Application Operations Management (AOM) is a one-stop, multi-dimensional O&M management platform for cloud applications. It monitors applications and related cloud resources in real time, collects and associates resource metrics, logs, and events to analyze application health status, and supports alarm reporting and data visualization, helping you detect faults in a timely manner and monitor the running status of applications, resources, and services in real time.
Specifically, AOM monitors and uniformly manages servers, storage devices, networks, web containers, and applications hosted in Docker and Kubernetes, effectively preventing problems, facilitating fault locating, and reducing O&M costs. Unlike traditional monitoring systems, AOM monitors services by applications. It meets enterprises' requirements for high efficiency and fast iteration, provides effective IT support for their services, and protects and optimizes their IT assets, enabling enterprises to achieve strategic goals.
Console Description
Table 1 AOM console description
Category |
Description |
Overview |
Both the O&M overview and dashboard are provided.
- O&M
The O&M page supports full-link, multi-layer, and one-stop O&M for resources, applications, and user experience.
- Dashboard
With a dashboard, different graphs such as line graphs and digit graphs are displayed on the same screen, which lets you view comprehensive monitoring data.
|
Alarm center |
The alarm center displays the alarm list, event list, alarm rules, and notification rules.
- Alarm list
Alarms are the information which is reported when AOM or an external service is abnormal or may cause exceptions. You need to take measures accordingly. Otherwise, service exceptions may occur.
The alarm list displays the alarms generated within a specified time range.
- Event list
Events generally carry some important information, informing you of the changes of AOM or an external service. Such changes do not necessarily cause exceptions.
The event list displays the events generated within a specified time range.
- Alarm rules
By setting alarms rules, you can define event conditions for services or threshold conditions for resource metrics. If the resource data of a service meets the event condition, an event alarm will be generated. If the metric data of a resource meets the threshold condition, a threshold alarm will be generated. If no metric data is reported, an insufficient data event will be generated. In this way, you can discover and handle exceptions at the earliest time.
- Alarm notification
AOM supports alarm notification. You can create notification rules and alarm action rules, and configure alarm noise reduction. When alarms are reported due to an exception in AOM or an external service, alarm information can be sent to specified personnel by email or Short Message Service (SMS) message. In this way, they can rectify faults in time to avoid service loss.
|
Monitoring |
Functions such as application monitoring, component monitoring, host monitoring, container monitoring, and metric monitoring are provided.
- Application monitoring
An application is a group of identical or similar components divided based on service requirements. AOM supports monitoring by application.
- Component monitoring
Components refer to the services that you deploy, including containers and common processes.
The Component Monitoring page displays information such as type, CPU usage, memory usage, and status of each component. AOM supports drill-down from components to instances, and then to containers, enabling multi-dimensional monitoring.
- Host monitoring
The Host Monitoring page enables you to monitor common system devices such as disks and file systems, and resource usage and health status of hosts and service processes or instances running on them.
- Container monitoring
For container monitoring, only workloads deployed using Cloud Container Engine (CCE) and applications created using ServiceStage are monitored.
- Metric monitoring
The Metric Monitoring page displays metric data of each resource. You can monitor metric values and trends in real time, add desired metrics to dashboards, create threshold rules, and export monitoring reports. In this way, you can monitor services and analyze data in real time.
- Cloud service monitoring
The Cloud Service Monitoring page displays historical performance curves of each cloud service instance. You can view cloud service data of the last six months.
|
Log |
Functions such as log search, log file, log dump, and path configuration are provided.
- Log search
AOM enables you to quickly query logs, and locate faults based on log sources and contexts.
- Log files
You can quickly view log files of component instances to locate faults.
- Log dumps
AOM enables you to dump logs to Object Storage Service (OBS) buckets for long-term storage.
- Path configuration
AOM can collect and display container and VM logs. VM refers to an Elastic Cloud Server (ECS) or a Bare Metal Server (BMS) running Linux. Before collecting logs, ensure that you have configured a log collection path.
- Log buckets
A log bucket is a logical group of log files. You can dump log files, create statistical rules, and view logs by log bucket.
- Statistical rules
A statistical rule takes effect by log bucket. You can configure keywords in statistical rules. Then, AOM periodically counts the number of such keywords in log buckets and generates log metrics.
- Log structuring
In log structuring, original logs can be separated by regular expressions or special characters so that structured logs can be queried and analyzed based on the SQL syntax.
- Accessing LTS
By adding access rules, you can map logs of CCE, Cloud Container Instance (CCI), or custom clusters in AOM to Log Tank Service (LTS). Then you can view and analyze logs on LTS. Mapping does not generate extra fees, but duplicate mapping will.
|
Configuration management |
Functions such as ICAgent management, application discovery, and log configuration are provided.
- ICAgent management
ICAgent collects metrics, logs, and application performance data in real time. For hosts purchased from the Elastic Cloud Server (ECS) or Bare Metal Server (BMS) console, you need to manually install the ICAgent. For hosts purchased from the CCE console, the ICAgent is automatically installed.
- Data subscription
AOM allows you to subscribe to metrics or alarms. After the subscription, data can be forwarded to custom Kafka or Distributed Message Service (DMS) topics for you to retrieve.
- Application discovery
AOM can discover applications and collect their metrics based on configured rules.
- Log configuration
Log quotas and delimiters can be configured.
- Quota configuration
Earlier metrics will be deleted when the metric quota is exceeded.
You can change the metric quota by switching between the basic edition and pay-per-use edition. In the basic edition, limited functions are provided for free.
- Metric configuration
You can enable the metric collection function to collect metrics (excluding SLA and custom metrics).
|