What Service Items Are Included in the Deterministic O&M Management Service?
- Application Agent Maintenance
Customers use resources from multiple cloud vendors, and they may encounter issues like difficult coordination and management, and low efficiency during routine O&M. To address these issues, Huawei Cloud centrally provides agent maintenance, assisted O&M, and hosting services to reduce costs and improve efficiency, ensuring customer service stability and enabling timely fault detection, demarcation, and rectification. Deterministic O&M Management Service is based on the deterministic operations solution and aims to build a competitive and professional cloud management service. It solves customers' O&M problems and difficulties in one-stop mode, enhances the stickiness between Huawei Cloud and customers, improves customer satisfaction, and drives customers to continuously or purchase Huawei Cloud services or expand their businesses with enhanced service features.
Service
Description
Scenario
O&M platform application O&M hosting (7*24)
The service provides remote application hosting capabilities for customer applications within the service period. The capabilities include 24/7 monitoring of customer applications based on customer logs, metrics, and alarms; responding to alarms in the customer production environment based on preset plans; tracking production faults; and managing the entire lifecycle.
This service is suitable for scenarios where high service availability is required.
O&M platform application hosting incremental service (7*24)
The service provides remote application hosting capabilities for the customer's new applications or those with capacity expansion. The capabilities include 24/7 monitoring of customer applications based on customer logs, metrics, and alarms; responding to alarms in the customer production environment based on preset plans; tracking production faults; and managing the entire lifecycle.
This service is suitable for scenarios where 24/7 incremental service packages are needed.
O&M platform application hosting (5*8)
The service provides remote application hosting capabilities for customer applications within the service period. The capabilities include 8/5 monitoring of customer applications based on customer logs, metrics, and alarms; responding to alarms in the customer production environment based on preset plans; tracking production faults; and managing the entire lifecycle.
This service is suitable for scenarios where high service availability is required.
O&M platform application hosting incremental service (5*8)
The service provides remote application hosting capabilities for the customer's new applications or those with capacity expansion. The capabilities include 8/5 monitoring of customer applications based on customer logs, metrics, and alarms; responding to alarms in the customer production environment based on preset plans; tracking production faults; and managing the entire lifecycle.
This service is suitable for scenarios where 8/5 incremental service packages are needed.
Chaos Engineering Drill Service Basic Edition
The chaos engineering drill service helps users verify potential risks of the system online. The drill service covers the entire process, including fault mode identification and construction, drill risk analysis and control, emergency plan formulation, fault injection, fault recovery, and review. Help users build the system capability of chaos engineering: Build the fault mode library and weapon library, verify the effectiveness of emergency plans, improve the fast fault recovery capability of the OM team, continuously practice and optimize the emergency system, organize emergency capabilities, and improve the system resilience and reliability.
The customer needs to enhance their chaos engineering drill capabilities and assist them in completing the drill. The number of drill scenarios should not exceed five.
Chaos Engineering Drill Service Incremental Package
The service content is the same as that of the basic version of the chaos engineering drill service. The service is purchased together with the basic package to expand the service scope. This service is applicable to the scenario where the number of drill scenarios exceeds 5. An incremental package is purchased as required for each drill scenario.
The number of customer drill scenarios exceeds the limit for the basic edition.
O&M Platform Chaos Engineering Drill Service
Empowers you to identify potential risks in your systems online. This full-process fault drill service covers every critical phase, including failure mode identification and construction, drill risk analysis and control, contingency plan formulation, fault injection, fault rectification, and drill review. By using this service, you can enhance your chaos engineering system capabilities, develop failure mode libraries, and verify the effectiveness of your custom contingency plans.
You need to improve your capabilities in chaos engineering drills.
O&M Platform Fault Management Managed Service
Is backed by the extensive experience of Huawei Cloud experts learned from years of best practices of cloud service O&M on Huawei Cloud. This service analyzes pain points of your core applications, manages the corresponding faults in fault trees, develops contingency plans tailored for your services, and verifies the faults through chaos drills; helping you improve fault rectification efficiency.
Your systems often experience service incidents due to non-standard fault management process in the system.
O&M Platform Release Management Optimization Implementation Service
Analyzes process risks, sorts out standard SOPs, and provides optimization recommendations based on single change scenarios, such as software or configuration change.
1. Your change management system is immature.
2. You have mastered certain standardized operation capabilities and want to elevate your capability in deterministic control of change-related risks.
O&M Platform Release Management Onsite Support Service (Standard Package)
Helps you oversee change review management, organize change backtracking, distill actionable insights from changes, deliver support for major changes, and manage change projects. Each standard package contains a maximum of 100 applications.
O&M Platform Release Management Onsite Support Service (Add-on Package)
The service content is the same as that of the standard package. It is delivered to meet service demands of deploying over 100 applications, surpassing the capabilities of the standard package. Each add-on package contains a maximum of 10 applications.
O&M Platform Application Hosting Implementation Service
Manages resources and applications within customers' application O&M scope on the O&M platform. A maximum of 100 instances can be managed. The scope of management covers resource management, account hosting, log collection, and monitoring configuration.
Customers' services are connected to the O&M platform for hosting for the first time.
O&M Platform Application Hosting Implementation Incremental Service
Manages and connects to incremental services and resources of customers to meet O&M requirements.
Customer have incremental services to be connected to the O&M platform.
O&M Platform PRR Governance Service
Helps customers carry out production readiness review (PRR) activities using O&M tools, specify the corresponding review process, and develop PRR review sub-items, content description, and evaluation criteria for corresponding services. Helps customers implement automatic development related to online review, conduct PRRs on actual businesses, and provides review results.
Customers require Huawei experts to provide PRR services and perform related operations on actual businesses.
O&M Platform Runtime Risk Assessment and Governance Service
Customizes the standard process of risk assessment for customers' runtime businesses based on O&M tools, and customizes the information and detection standards for risk assessment sub-items. Automates some assessment-related operations, assesses risks of actual businesses, and provides assessment results.
Customers require Huawei experts to perform risk assessment and related operations on actual businesses.
O&M Platform Service Availability Measurement and Governance Service
Formulates service level objectives (SLOs) for customers' products, develops service level indicator (SLI) items and baseline data, and monitors the SLOs or SLIs based on the customers' actual business and O&M tools.
Customers require Huawei experts to perform availability measurement and related operations on actual businesses.
O&M Platform Development Support Service - Junior O&M Engineer
Provides basic development support, including SDK/API usage support and demo display, development environment setup guide, and application development guide. This service helps you quickly develop intelligent applications on the platform to address diverse issues you encounter by assisting in data preparation, model selection and optimization, inference acceleration, knowledge engineering; and application orchestration, deployment, and integration.
Customers need Huawei Cloud professional services for operations optimization.
O&M Platform Development Support Service - Intermediate O&M Engineer
Provides development support for migrating, adapting, and reconstructing applications or data on O&M platform. The content includes migration evaluation and solution design of AI applications and matching models, reconstruction and commissioning of AI applications and model inference scripts, performance optimization of single-node and distributed systems, and fine-tuning, training script reconstruction, and performance debugging of foundation models.
O&M Platform Development Support Service - Senior O&M Engineer
Provides support for common component development on the O&M platform in the following scenarios:
(1) Interconnection with third-party closed-source models
(2) Introduction of open-source models
(3) Incremental pre-training
(4) Scenario-based optimization of open-source models
(5) Model assessment: includes the assessment policies, assessment datasets, and execution processes based on the accelerator evaluation function. (Objective content: automatically completed by the accelerator; subjective content: manual execution)
(6) Data preparation: data access, governance (clear deduplication and normalization), and data set generation
(7) Knowledge base building
(8) Prompt project
(9) Dynamic knowledge injection RAG
(10) Knowledge base access
O&M Platform Development Support Service - Senior Technical Account Manager
Provides professional support during application development.
(1) Helps you conduct requirement survey and solution design for the overall solution based on your application scenarios, including but not limited to planning for compute, storage, and network, data preparation design, large model selection and evaluation, and solutions for model orchestration and software platform integration.
(2) Provides development support based on the existing application design solution. This includes data preparation, model selection and optimization, inference acceleration, knowledge engineering, application orchestration, and deployment.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot