Application Scenarios

O&M BI Dashboard

The dedicated O&M BI dashboard caters to various O&M roles, aiding in optimization, insight generation, and decision-making.

Rich metrics: COC provides over 30 preset O&M metrics, delivering insights into your cloud resources across seven-perspective BI dashboards and a comprehensive enterprise-grade O&M sandbox. The O&M sandbox and the BI dashboards help you understand your service O&M situation from both bird's eye and ground level views in real time.

Figure 1 O&M BI dashboard
Click to enlarge

Full-Lifecycle Resource Management

Full-lifecycle resource management is available, and includes actions such as resource defining, requesting, provisioning, O&M, changing, configuration, renewal, and recycling; building a unified resource management center.

Full-lifecycle management: eliminates breakpoints across the entire user resource management journey, ensuring smooth user resource management and efficient O&M.
Resource management center: enables visualized management of your resources from a global perspective, and supports multi-cloud and cross-account centralized O&M.

Figure 2 Full-lifecycle resource management
Click to enlarge

Change Risk Control and Operations Trustworthiness

Management and control models that integrate Huawei SRE best practices in secure production provide you with trustworthy, stable, and reliable O&M capabilities.

All-round operations trustworthiness ensures operational security before, during, and after changes, is supported by personnel risk assessment capabilities, and offers high-risk command alerts, and automated inspection.
AI-powered risk assessment: The intelligent interception algorithm for high-risk commands is used to mitigate operation risks.

Figure 3 Change risk control and operations trustworthiness
Click to enlarge

Standardized Fault Management

The standardized fault management process and war room enhance efficient fault synergy and rapid fault recovery.

Standard process: provides a standardized troubleshooting process on Huawei Cloud. Bolstered by response plans and the war room-based synergy of O&M engineers, R&D teams, and other personnel, this standardized process helps you handle faults encountered with ease.
O&M knowledge base: enables the swift handling of faults. A rich repository of O&M knowledge, derived from handling historical faults and the accumulation of experience in handling unknown faults, increases efficiency during fault handling process.

Figure 4 Standardized fault management
Click to enlarge

Intelligent Chaos Drills

Full-stack chaos engineering solutions enable you to quickly evaluate the potential resilience risks of applications and continuously monitor application architectures.

E2E chaos engineering solutions: provide E2E chaos drill capabilities based on your service scenarios from four dimensions: risk analysis, contingency plans, drill execution, and drill review.
Failure mode library: introduces the methodology of analyzing fault scenarios for DR, and leverages Huawei Cloud SREs' years of accumulated experience in fault handling through the failure mode library.

Figure 5 Intelligent chaos drills
Click to enlarge