Operations Dashboards

All applications that support fuzzy searches allow for searches using wildcards, percent signs (%) and underscores (_). If exact searches are needed, you can add backslashes (\) for escape.
Procedure
- Choose Visualization from the main menu.
The Operations Dashboards page is displayed by default. Table 1 describes the operations dashboards.
- Select a dashboard to access it.
You can click Export to export dashboard data to your local system.
This export function is available only in the professional edition of ESM.
Table 1 Descriptions of visual operations dashboards Dashboard Name
Description
Resource Summary
The Resource Summary dashboard displays the quantities of hardware resources and common cloud resources, cloud resource summary, physical resource usage, cloud resource quota statistics, cloud resource usage trend, virtual resource allocation rates, virtual resource allocation statistics, and virtual resource usage.
You can select a time range from the drop-down list in the upper right corner of the dashboard to query physical resource usage, cloud resource usage trend, and virtual resource usage.
You can drill down the Resource Summary dashboard to access the Compute Node Capacity Details and Storage Node Capacity Details dashboards. The displayed items in the three dashboards are similar and are described in Table 2.
Hardware Resources
The Hardware Resources dashboard displays the quantity, running statuses, and alarm statuses of hardware resources. This dashboard helps you keep abreast of resource statuses and adjust resource allocation in a timely manner to increase resource utilization and avoid potential risks.
Table 4 describes the items displayed in this dashboard.
Hardware Alarms
The Hardware Alarms dashboard displays the total number of alarms in the past month, overall alarm statuses, hardware alarm statistics, alarm growth trend in the past month, and uncleared alarms of hardware resources. This dashboard helps you keep abreast of hardware resource alarms and clear alarms in a timely manner.
Table 6 describes the items displayed in this dashboard.
Cloud Service Status
The Cloud Service Status dashboard displays the overall status and alarms of all cloud services in each region. It helps you quickly identify the cloud services that have high and potential risks and improve the operational efficiency.
Table 8 describes the items displayed in this dashboard.
NOTE:The cloud service statuses are described as follows:
- High-risk: Deployed cloud services have critical alarms that are not cleared.
- Low-risk: Deployed cloud services have major alarms (instead of critical alarms) that are not cleared.
- Healthy: Deployed cloud services do not have critical or major alarms that are not cleared.
- Undeployed: Cloud services have not been deployed.
Tenant Resources
The Tenant Resources dashboard displays total number of each cloud resource, clouds of each tenant and cloud resource trends, and resource usage of each tenant, helping users learn about distribution and usage of resources from the tenant perspective.
Table 5 describes the items displayed in this dashboard.
Audit Logs
The Audit Logs dashboard displays statistics on operations by risk level as well as operation details, including operation names, resource types (such as servers, storage, and networks), and operation time. In the upper right corner of the dashboard, you can select a time range to view desired logs.
Table 7 describes the items displayed in this dashboard.
Service Capacity Details
This dashboard displays statistics of cloud services by resource types. It also displays how the total, allocated, and available capacities of physical and logical resources. The dashboard allows you to query information, such as the allocated capacities and used capacities, about resource pools based on VM specifications and to export query results.
Table 9 describes the items displayed in this dashboard.
Hardware Metrics
The Hardware Monitoring Dashboard dashboard displays key metrics of servers, including CPU usage, memory usage, disk I/O usage, packet loss rate, average system load, and load statistics of top 5 servers. Drill-down screens display hardware details, resource allocation, and monitoring information of servers.
For details about the displayed items, see Table 10.
NOTE:The dashboard is available only to users of regions where the professional edition is enabled.
AI Resource Operation Dashboard
This dashboard displays compute resource statistics, including total GPUs, total NPUs, dedicated resource poos, and total AI nodes. It also displays compute capacity of the current site and public sites, top associated training jobs and inference tasks by usage, and resource usage trends.
For details about the displayed items, see Table 11.
AI Resource Detail Dashboard
This dashboard is a drilldown dashboard of AI Resource Operation Dashboard. It displays resource details in the resource pool associated with the tenant. For details about the displayed items, see Table 12.
Table 2 Resource Summary Item
Description
Hardware Resources
Displays quantities of all types of hardware resources.
Common Cloud Resources
Displays the quantities of provisioned ECSs, provisioned EVS disks, and other common cloud resources provisioned in the management and tenant zones.
Click Details next to EVS disks to view details about storage node resource capacities in the tenant and management zones.
Cloud Resource Overview
Displays the quantities of provisioned ECSs, provisioned EVS disks, and other cloud resources provisioned in the management and tenant zones.
Physical Resource Usage
Displays the average CPU and memory usage per day within a selected time range.
You can click Details next to this item to view details about compute node resource capacities in the tenant and management zones.
Cloud Resource Statistics
Displays the allocated and total quantities of cloud resources (only in the tenant zone).
Cloud Resource Usage
Displays how cloud resource usage changes (only in the tenant zone) within a selected time range by day.
Virtual Resource Allocate Rates
Displays the allocation rates of virtual resources (including vCPUs, memory, and disks) in both tenant and management zones.
Virtual Resource Statistics
Displays the allocated and total capacities of virtual resources (including vCPUs, memory, and disks) in both tenant and management zones.
Virtual Resource Usage
Displays the usage of virtual resources (including vCPUs, memory, and disks) in both tenant and management zones within the selected time range by day.
Compute Node Capacity Details
Displays details about capacities of ECS resources including vCPUs, memory, and vGPUs by AZ, cluster, and resource type in both tenant and management zones.
Storage Node Capacity Details
Displays details about capacities of EVS resources including common I/O, high I/O, and ultra-high I/O disks by AZ and resource type in both tenant and management zones. For details, see Table 3.
Table 3 Resource types Parameter
Extreme SSD
General-Purpose SSD V2
Ultra-high I/O
General-Purpose SSD
High I/O
Common I/O
API Namee
ESSD
GPSSD2
SSD
GPSSD
SAS
SATA
Description
Superfast disks for workloads demanding ultra-high bandwidth and ultra-low latency
SSD-backed disks allowing for tailored IOPS and throughput and targeting for transactional workloads that demand high performance and low latency
High performance disks excellent for enterprise mission-critical services as well as workloads demanding high throughput and low latency
Cost-effective disks designed for enterprise applications with medium performance requirements
Disks suitable for commonly accessed workloadsf
Disks suitable for less commonly accessed workloads
Table 4 Hardware Resources Item
Description
Servers
Displays the number of servers and their information such as server names, alarm statuses, running statuses, and management IP addresses.
Switches
Displays the number of switches and their information such as switch names, alarm statuses, running statuses, and management IP addresses.
Routers
Displays the number of routers and their information such as router names, alarm statuses, running statuses, and management IP addresses.
Firewalls
Displays the number of firewalls and their information such as firewall names, alarm statuses, running statuses, and management IP addresses.
Security Devices
Displays the number of security devices and their information such as security device names, alarm statuses, running statuses, and management IP addresses.
Table 5 Tenant resources Item
Description
Cloud Resources TOP10
Displays statistics on quantity of cloud resources by tenant and top 10 cloud resources by quantity.
Resource Usage Trends
Displays daily changes on quantity of cloud resources.
Information
Displays quantity of each cloud resource.
List
Cloud resource details, displaying basic resource information.
Table 6 Hardware Alarms Item
Description
Historical and Active Alarms
Displays the total number of historical alarms (cleared alarms) and active alarms (uncleared alarms) in the past month, and the number of alarms (cleared alarms and uncleared alarms) at each severity level.
Cleared and Uncleared Alarms
Displays the quantities of cleared and uncleared alarms and the percentages they account for in the total number of alarms in the past month.
Alarms by Device Type
Displays the quantities of alarms (including cleared and uncleared alarms) of network devices (including switches, routers, and firewalls) and physical hosts in the past month.
Alarm Change Trend in the Past Month
Displays how the number of alarms (including cleared alarms and uncleared alarms) at each severity level changes over the past month.
Current Alarms
Displays information about alarms uncleared in the past month, including the alarm names, severity levels, device IDs, first occurrence time, and last occurrence time. To check alarm details, click an alarm name.
History Alarms
Displays information about cleared and uncleared alarms in the past month, including the alarm names, severity levels, device IDs, first occurrence time, and last occurrence time. To check alarm details, click an alarm name.
Table 7 Audit Logs Item
Description
Risks by Level
Displays respective quantities of operation risks at all levels (critical, major, minor, and warning).
Logs
Displays the operation log list, including the operation names, risk levels, and operators. If the list spans multiple pages, the pages will be displayed in a scrolling way. You can search for logs by operation name or region.
Table 8 Cloud Service Status Item
Description
Status Distribution
Displays the quantities of cloud services in the High-risk, Low-risk, Healthy, and Undeployed states.
Alarm Status Overview
Displays the statuses and alarm quantities of all cloud services in each region.
- If there are fewer than five regions, the regions are arranged horizontally. A grid below each region indicates a cloud service in that region. The quantities of uncleared critical and major alarms are displayed in the grid of each deployed cloud service.
- If there are five or more regions, the alarm status overview is displayed in a two-dimensional table. Horizontal headers are regions, while the vertical headers are cloud services. When you move your pointer to the cell of a deployed cloud service in a region, the quantities of uncleared critical and major alarms of the cloud service are displayed.
Table 9 Service Capacity Details Item
Description
Statistics by Resource Type
Displays a capacity view of frequently-used basic cloud services, including OBS, BMS, SFS, RDS, DeH, EIP, and VPN.
Resource Usage
Displays how the quantities of used resources from different cloud services change over time. You can select Last week, Last month, or Last 3 months in the upper right corner of the page to query resource usage statistics.
Table 10 Hardware metrics Item
Description
Server usage metrics
Displays high CPU usage, high memory usage, high disk I/O usage, high packet loss rate during packet sending, and high average system load.
Top 5 resources by load
Displays top 5 resources with high usage metrics.
Hardware Details
Displays configuration details of the server, including CPUs, disks, memory, MAC address, and IP address.
Resource Allocation
Displays basic information of the server, including name, region, AZ, SN, and type.
Monitoring Information
Displays server metrics using data charts, such as disk I/O metrics and NIC metrics.
Table 11 AI Resource Operation Dashboard Item
Description
Resource or Metric Statistics
Displays the AI nodes, GPUs, NPUs, dedicated resource pools, tasks, query requests per second (QPS), average response latency, and total requests (last 1 hour).
Computing Power
Displays the compute capacity of the current site and public sites, including CPU allocation, memory allocation, NPU usage, NPU (video memory) allocation, and GPU allocation.
Usage ranking (Top 10)
Displays top 10 tenants by usage of training jobs and inference tasks. Tenants can be ranked by compute usage or task quantity.
Resource usage trend statistics
Displays the resource usage trend chart. You can view the chart by last day, last week, last month, or last three months.
Table 12 AI Resource Detail Dashboard Item
Description
Resource or Metric Statistics
Displays resource pools, AI nodes, tasks, training jobs, inference tasks, Notebooks, NPU compute capacity, NPUs, GPU compute capacity, and GPUs.
Resource Pool Statistics
Displays tasks, AI nodes, users, NPUs, allocated NPUs, NPU allocation, GPU allocation, NPU video memory allocation, total CPU, average CPU usage, total memory, and average memory usage.
Trend Chart
Displays the trends of the NPU allocation, NPU video memory allocation, GPU allocation, CPU allocation, and memory allocation.
Node Information
Displays node details, such as the node name, node IP address, and total NPUs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot