Help Center/ ServiceStage/ User Guide (Kuala Lumpur Region)/ Viewing Monitoring Metrics and Alarms
Updated on 2024-01-22 GMT+08:00

Viewing Monitoring Metrics and Alarms

Introduction

Application Operations Management (AOM) monitors and displays the running status of ServiceStage and the usage of each metric, and creates alarm rules for monitoring items.

After you use ServiceStage to deploy components, AOM can associate monitoring metrics of the components to help you master the performance metrics of the components in real time and accurately master the running status of the components.

Setting Monitoring and Alarms

CCE works with AOM to comprehensively monitor clusters. When a node is created, the ICAgent (the DaemonSet named icagent in the kube-system namespace of the cluster) of AOM is installed by default. The ICAgent collects monitoring data of underlying resources and workloads running on the cluster, and uploads the data to AOM. In addition, after Customizing Component Running Metrics, the ICAgent can collect monitoring data of user-defined load metrics and upload the data to AOM.

After Configuring Alarm Thresholds for Resource Monitoring, alarms generated during component running are reported to AOM.

Supported Metrics

Metrics reflect the resource performance or status.

Basic resource monitoring includes CPU, memory, and disk monitoring. For details, see Table 1.

  • Table 1 Resource metrics

    Metric

    Description

    Value Range

    Unit

    Total CPU cores (cpuCoreLimit)

    Total number of CPU cores that have been applied for a measured object

    ≥1

    Cores

    Used CPU cores (cpuCoreUsed)

    Number of CPU cores used by a measured object

    ≥0

    Cores

    CPU usage (cpuUsage)

    CPU usage of a measured object, that is, the ratio of the used CPU cores to the total CPU cores.

    0%–100%

    %

    Total physical memory (memCapacity)

    Total physical memory that has been applied for a measured object

    ≥0

    MB

    Physical memory usage (memUsage)

    Percentage of the used physical memory to the total physical memory

    0%–100%

    %

    Used physical memory (memUsed)

    Used physical memory of a measured object

    ≥0

    MB

    Disk read rate (diskReadRate)

    Volume of data read from a disk per second

    ≥0

    KB/s

    Disk write rate (diskWriteRate)

    Volume of data written into a disk per second

    ≥0

    KB/s

    Downlink rate (recvPackRate)

    Number of data packets received by the NIC per second

    ≥0

    Packets per second (PPS)

    Total file system (filesystemCapacity)

    Total file system capacity of a measured object. This metric is available only for containers using the Device Mapper storage drive in the Kubernetes cluster of version 1.11 or later.

    ≥0

    MB

    Downlink rate (recvBytesRate)

    Inbound traffic rate of a measured object

    ≥0

    Byte per second (BPS)

    Downlink error rate (recvErrPackRate)

    Number of error packets received by an NIC per second

    ≥0

    PPS

    Uplink rate (sendPackRate)

    Outbound traffic rate of a measured object

    ≥0

    BPS

    Uplink error rate (sendErrPackRate)

    Number of error packets sent by the NIC per second

    ≥0

    PPS

    Uplink rate (sendBytesRate)

    Outbound traffic rate of a measured object

    ≥0

    BPS

    Error packets (rxPackErrors)

    Number of error packets received by a measured object

    ≥0

    Packets

    Threads (threadsCount)

    Number of threads created on a host

    ≥0

    N/A

    Available file system (filesystemAvailable)

    Available file system capacity of a measured object. This metric is available only for containers using the Device Mapper storage drive in the Kubernetes cluster of version 1.11 or later.

    ≥0

    MB

    File system usage (filesystemUsage)

    File system usage of a measured object, that is, the ratio of the used file system to the total file system. This metric is available only for containers using the Device Mapper storage drive in the Kubernetes cluster of version 1.11 or later.

    ≥0

    %

    Handles (handleCount)

    Number of handles used by a measured object

    ≥0

    N/A

    Component status (status)

    Status of an application group

    • 0: normal
    • 1: abnormal

    N/A

    Total virtual memory (virMemCapacity)

    Total virtual memory that has been applied for a measured object

    ≥0

    MB