Help Center/ Well-Architected Framework/ Well-Architected Framework and Practices/ Operational Excellence Pillar/ Examples/ Improving System O&M Capabilities to Reduce O&M Costs and Difficulties with AOM
Updated on 2025-05-22 GMT+08:00

Improving System O&M Capabilities to Reduce O&M Costs and Difficulties with AOM

A platform service has 10 million certified drivers and 5 million cargo users. The group's service covers 339 major cities in China and more than 110,000 routes, implementing multi-center operations architecture across the country.

Customer pain points:

  • Difficult to ensure O&M in multi-cloud active-active DR scenarios: Many clusters need more than single-vendor DR to keep services running smoothly. A parallel active-active system is required to ensure zero switchover in case of faults. During this process, the customer's self-built O&M platform is insufficient to meet O&M requirements.
  • Cloud service metrics cannot be collected: The customer's self-built O&M system cannot collect metrics of cloud services, which cannot meet the requirements of large-screen display.
  • Insufficient alarm notification capabilities: The self-built O&M platform lacks sufficient alarm notification capabilities for multiple scenarios and does not support alarm noise reduction.

Solutions:

Benefits:

  • Reduced O&M costs and difficulties: Simplified management of multiple systems, decreased initial customer O&M resource investments, and lowered O&M costs.
  • Improved operations analysis capabilities: Visual charts and out-of-the-box dashboards enable quick service operations analysis.
  • Improved troubleshooting capabilities: Cloud-based multi-dimensional monitoring enables comprehensive service operations and maintenance, achieving rapid detection, location, and handling of faults through automated alarm rules.