Updated on 2025-05-29 GMT+08:00

Operations Dashboards

All applications that support fuzzy searches allow for searches using wildcards, percent signs (%) and underscores (_). If exact searches are needed, you can add backslashes (\) for escape.

Procedure

  1. Choose Visualization from the main menu.

    The Operations Dashboards page is displayed by default. Table 1 describes the operations dashboards.

  2. Select a dashboard to access it.

    You can click Export to export dashboard data to your local system.

    This export function is available only in the professional edition of ESM.

    Table 1 Descriptions of visual operations dashboards

    Dashboard Name

    Description

    Resource Summary

    The Resource Summary dashboard displays the quantities of hardware resources and common cloud resources, cloud resource summary, physical resource usage, cloud resource quota statistics, cloud resource usage trend, virtual resource allocation rates, virtual resource allocation statistics, and virtual resource usage.

    You can select a time range from the drop-down list in the upper right corner of the dashboard to query physical resource usage, cloud resource usage trend, and virtual resource usage.

    You can drill down the Resource Summary dashboard to access the Compute Node Capacity Details and Storage Node Capacity Details dashboards. The displayed items in the three dashboards are similar and are described in Table 2.

    Hardware Resources

    The Hardware Resources dashboard displays the quantity, running statuses, and alarm statuses of hardware resources. This dashboard helps you keep abreast of resource statuses and adjust resource allocation in a timely manner to increase resource utilization and avoid potential risks.

    Table 4 describes the items displayed in this dashboard.

    Hardware Alarms

    The Hardware Alarms dashboard displays the total number of alarms in the past month, overall alarm statuses, hardware alarm statistics, alarm growth trend in the past month, and uncleared alarms of hardware resources. This dashboard helps you keep abreast of hardware resource alarms and clear alarms in a timely manner.

    Table 6 describes the items displayed in this dashboard.

    Cloud Service Status

    The Cloud Service Status dashboard displays the overall status and alarms of all cloud services in each region. It helps you quickly identify the cloud services that have high and potential risks and improve the operational efficiency.

    Table 8 describes the items displayed in this dashboard.

    NOTE:

    The cloud service statuses are described as follows:

    • High-risk: Deployed cloud services have critical alarms that are not cleared.
    • Low-risk: Deployed cloud services have major alarms (instead of critical alarms) that are not cleared.
    • Healthy: Deployed cloud services do not have critical or major alarms that are not cleared.
    • Undeployed: Cloud services have not been deployed.

    Tenant Resources

    The Tenant Resources dashboard displays total number of each cloud resource, clouds of each tenant and cloud resource trends, and resource usage of each tenant, helping users learn about distribution and usage of resources from the tenant perspective.

    Table 5 describes the items displayed in this dashboard.

    Audit Logs

    The Audit Logs dashboard displays statistics on operations by risk level as well as operation details, including operation names, resource types (such as servers, storage, and networks), and operation time. In the upper right corner of the dashboard, you can select a time range to view desired logs.

    Table 7 describes the items displayed in this dashboard.

    Service Capacity Details

    This dashboard displays statistics of cloud services by resource types. It also displays how the total, allocated, and available capacities of physical and logical resources. The dashboard allows you to query information, such as the allocated capacities and used capacities, about resource pools based on VM specifications and to export query results.

    Table 9 describes the items displayed in this dashboard.

    Hardware Metrics

    The Hardware Monitoring Dashboard dashboard displays key metrics of servers, including CPU usage, memory usage, disk I/O usage, packet loss rate, average system load, and load statistics of top 5 servers. Drill-down screens display hardware details, resource allocation, and monitoring information of servers.

    For details about the displayed items, see Table 10.

    NOTE:

    The dashboard is available only to users of regions where the professional edition is enabled.

    AI Resource Operation Dashboard

    This dashboard displays compute resource statistics, including total GPUs, total NPUs, dedicated resource poos, and total AI nodes. It also displays compute capacity of the current site and public sites, top associated training jobs and inference tasks by usage, and resource usage trends.

    For details about the displayed items, see Table 11.

    AI Resource Detail Dashboard

    This dashboard is a drilldown dashboard of AI Resource Operation Dashboard. It displays resource details in the resource pool associated with the tenant. For details about the displayed items, see Table 12.

    Table 2 Resource Summary

    Item

    Description

    Hardware Resources

    Displays quantities of all types of hardware resources.

    Common Cloud Resources

    Displays the quantities of provisioned ECSs, provisioned EVS disks, and other common cloud resources provisioned in the management and tenant zones.

    Click Details next to EVS disks to view details about storage node resource capacities in the tenant and management zones.

    Cloud Resource Overview

    Displays the quantities of provisioned ECSs, provisioned EVS disks, and other cloud resources provisioned in the management and tenant zones.

    Physical Resource Usage

    Displays the average CPU and memory usage per day within a selected time range.

    You can click Details next to this item to view details about compute node resource capacities in the tenant and management zones.

    Cloud Resource Statistics

    Displays the allocated and total quantities of cloud resources (only in the tenant zone).

    Cloud Resource Usage

    Displays how cloud resource usage changes (only in the tenant zone) within a selected time range by day.

    Virtual Resource Allocate Rates

    Displays the allocation rates of virtual resources (including vCPUs, memory, and disks) in both tenant and management zones.

    Virtual Resource Statistics

    Displays the allocated and total capacities of virtual resources (including vCPUs, memory, and disks) in both tenant and management zones.

    Virtual Resource Usage

    Displays the usage of virtual resources (including vCPUs, memory, and disks) in both tenant and management zones within the selected time range by day.

    Compute Node Capacity Details

    Displays details about capacities of ECS resources including vCPUs, memory, and vGPUs by AZ, cluster, and resource type in both tenant and management zones.

    Storage Node Capacity Details

    Displays details about capacities of EVS resources including common I/O, high I/O, and ultra-high I/O disks by AZ and resource type in both tenant and management zones. For details, see Table 3.

    Table 3 Resource types

    Parameter

    Extreme SSD

    General-Purpose SSD V2

    Ultra-high I/O

    General-Purpose SSD

    High I/O

    Common I/O

    API Namee

    ESSD

    GPSSD2

    SSD

    GPSSD

    SAS

    SATA

    Description

    Superfast disks for workloads demanding ultra-high bandwidth and ultra-low latency

    SSD-backed disks allowing for tailored IOPS and throughput and targeting for transactional workloads that demand high performance and low latency

    High performance disks excellent for enterprise mission-critical services as well as workloads demanding high throughput and low latency

    Cost-effective disks designed for enterprise applications with medium performance requirements

    Disks suitable for commonly accessed workloadsf

    Disks suitable for less commonly accessed workloads

    Table 4 Hardware Resources

    Item

    Description

    Servers

    Displays the number of servers and their information such as server names, alarm statuses, running statuses, and management IP addresses.

    Switches

    Displays the number of switches and their information such as switch names, alarm statuses, running statuses, and management IP addresses.

    Routers

    Displays the number of routers and their information such as router names, alarm statuses, running statuses, and management IP addresses.

    Firewalls

    Displays the number of firewalls and their information such as firewall names, alarm statuses, running statuses, and management IP addresses.

    Security Devices

    Displays the number of security devices and their information such as security device names, alarm statuses, running statuses, and management IP addresses.

    Table 5 Tenant resources

    Item

    Description

    Cloud Resources TOP10

    Displays statistics on quantity of cloud resources by tenant and top 10 cloud resources by quantity.

    Resource Usage Trends

    Displays daily changes on quantity of cloud resources.

    Information

    Displays quantity of each cloud resource.

    List

    Cloud resource details, displaying basic resource information.

    Table 6 Hardware Alarms

    Item

    Description

    Historical and Active Alarms

    Displays the total number of historical alarms (cleared alarms) and active alarms (uncleared alarms) in the past month, and the number of alarms (cleared alarms and uncleared alarms) at each severity level.

    Cleared and Uncleared Alarms

    Displays the quantities of cleared and uncleared alarms and the percentages they account for in the total number of alarms in the past month.

    Alarms by Device Type

    Displays the quantities of alarms (including cleared and uncleared alarms) of network devices (including switches, routers, and firewalls) and physical hosts in the past month.

    Alarm Change Trend in the Past Month

    Displays how the number of alarms (including cleared alarms and uncleared alarms) at each severity level changes over the past month.

    Current Alarms

    Displays information about alarms uncleared in the past month, including the alarm names, severity levels, device IDs, first occurrence time, and last occurrence time. To check alarm details, click an alarm name.

    History Alarms

    Displays information about cleared and uncleared alarms in the past month, including the alarm names, severity levels, device IDs, first occurrence time, and last occurrence time. To check alarm details, click an alarm name.

    Table 7 Audit Logs

    Item

    Description

    Risks by Level

    Displays respective quantities of operation risks at all levels (critical, major, minor, and warning).

    Logs

    Displays the operation log list, including the operation names, risk levels, and operators. If the list spans multiple pages, the pages will be displayed in a scrolling way. You can search for logs by operation name or region.

    Table 8 Cloud Service Status

    Item

    Description

    Status Distribution

    Displays the quantities of cloud services in the High-risk, Low-risk, Healthy, and Undeployed states.

    Alarm Status Overview

    Displays the statuses and alarm quantities of all cloud services in each region.

    • If there are fewer than five regions, the regions are arranged horizontally. A grid below each region indicates a cloud service in that region. The quantities of uncleared critical and major alarms are displayed in the grid of each deployed cloud service.
    • If there are five or more regions, the alarm status overview is displayed in a two-dimensional table. Horizontal headers are regions, while the vertical headers are cloud services. When you move your pointer to the cell of a deployed cloud service in a region, the quantities of uncleared critical and major alarms of the cloud service are displayed.
    Table 9 Service Capacity Details

    Item

    Description

    Statistics by Resource Type

    Displays a capacity view of frequently-used basic cloud services, including OBS, BMS, SFS, RDS, DeH, EIP, and VPN.

    Resource Usage

    Displays how the quantities of used resources from different cloud services change over time. You can select Last week, Last month, or Last 3 months in the upper right corner of the page to query resource usage statistics.

    Table 10 Hardware metrics

    Item

    Description

    Server usage metrics

    Displays high CPU usage, high memory usage, high disk I/O usage, high packet loss rate during packet sending, and high average system load.

    Top 5 resources by load

    Displays top 5 resources with high usage metrics.

    Hardware Details

    Displays configuration details of the server, including CPUs, disks, memory, MAC address, and IP address.

    Resource Allocation

    Displays basic information of the server, including name, region, AZ, SN, and type.

    Monitoring Information

    Displays server metrics using data charts, such as disk I/O metrics and NIC metrics.

    Table 11 AI Resource Operation Dashboard

    Item

    Description

    Resource or Metric Statistics

    Displays the AI nodes, GPUs, NPUs, dedicated resource pools, tasks, query requests per second (QPS), average response latency, and total requests (last 1 hour).

    Computing Power

    Displays the compute capacity of the current site and public sites, including CPU allocation, memory allocation, NPU usage, NPU (video memory) allocation, and GPU allocation.

    Usage ranking (Top 10)

    Displays top 10 tenants by usage of training jobs and inference tasks. Tenants can be ranked by compute usage or task quantity.

    Resource usage trend statistics

    Displays the resource usage trend chart. You can view the chart by last day, last week, last month, or last three months.

    Table 12 AI Resource Detail Dashboard

    Item

    Description

    Resource or Metric Statistics

    Displays resource pools, AI nodes, tasks, training jobs, inference tasks, Notebooks, NPU compute capacity, NPUs, GPU compute capacity, and GPUs.

    Resource Pool Statistics

    Displays tasks, AI nodes, users, NPUs, allocated NPUs, NPU allocation, GPU allocation, NPU video memory allocation, total CPU, average CPU usage, total memory, and average memory usage.

    Trend Chart

    Displays the trends of the NPU allocation, NPU video memory allocation, GPU allocation, CPU allocation, and memory allocation.

    Node Information

    Displays node details, such as the node name, node IP address, and total NPUs.