OPS06-02 Defining Observable Objects

Risk level
High

Key strategies

The following table shows how to classify observable objects.

Observability Layer	Function/Major Metric
IT resource monitoring	IT resource monitoring monitors and reports the performance and capacity of IT resources to ensure stable and reliable running of your services.
Application monitoring	Application monitoring tracks resources across different layers (applications, service components, and environments) based on application and resource management. Each layer has its own set of metrics that are monitored. Monitor alarm information at the business, application, middleware, and infrastructure layers, and bind the dashboards to display system charts, metric sources, and log sources in charts. It is important to focus on metrics like available memory, the number of WAITING threads, and the number of TIMED_WAITING threads.
Process monitoring	Process monitoring is used to monitor active processes on a host. By default, information such as CPU usage, memory usage, and the number of opened files of these processes is collected. If you have customized process monitoring, the number of processes containing keywords is also monitored. It is important to focus on metrics like the number of running processes, idle processes, and zombie processes.
Log monitoring	The log configuration service extracts specified keywords from logs. This helps you use the monitoring service to monitor and report alarms for key metrics in logs. It is important to focus on the metrics like log size, the number of access logs, and the number of error logs.
Custom monitoring	The custom monitoring page displays all the metrics defined by yourself. You can call APIs to report collected monitoring data of those metrics to the monitoring service.
Middleware monitoring	The monitoring platform allows you to quickly install and configure middleware plug-ins and offers ready-to-use dashboards for monitoring. Currently, the following middleware plug-ins are supported: MySQL, Redis, MongoDB, Nginx, Node, HAProxy, COMP_EXPORTER, COMP_REDIS_EXPORTER, and COMP_MYSQL_EXPORTER
Server monitoring	It provides basic monitoring and OS monitoring of different monitoring granularities. Basic monitoring monitors metrics reported by ECSs. OS monitoring provides proactive, fine-grained OS monitoring for ECSs, and it requires the agent (a plug-in) to be installed on all ECSs to be monitored. It is important to focus on metrics like CPU_UTIL, DISK_READ_BYTES_RATE, and outband incoming rate.
On-premises component monitoring	Unified monitoring of on-premises components. The monitoring platform is interconnected with Grafana and Prometheus to monitor services, applications, and on-premises IDCs and middleware.
Network performance management monitoring	The full-link network that connects clients, networks, edges, and clouds is monitored. This helps you identify network faults quickly and keep track of the network's status. It is important to focus on metrics like application response time, DNS resolution time, TCP connection setup time, and access traffic.