Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Operations Center/ Best Practices/ Standardized Fault Management

Standardized Fault Management

Updated on 2024-04-19 GMT+08:00

Scenario

The incident handling process of a certain intelligent customer service O&M engineer is inefficient due to the lack of standardized accident handling procedures, clear fault recovery joint collaboration teams, and contingency plans. Similar fault scenarios repeatedly occur, no O&M experience is accumulated, and deterministic fault scenarios cannot be automatically restored. There are multiple severities of alarms, but the processing of alarms lacks standardized procedures and is relatively slow. It is necessary to establish a standardized incident process to achieve standardized processing.

Solution

End-to-end incident handling process: Clearly define standardized incident handling procedures, achieve multi-operational collaboration through WarRoom requests, and improve incident handling efficiency through response plans.

COC helps users manage alarms uniformly by setting up incident forwarding rules to convert raw alarms into incident or alarm tickets. When a raw alarm matches the incident forwarding rules, an incident/alarm is created, and the corresponding owner is notified according to the scheduling management. The owner can handle the alarm or convert it into an incident. After locating and restoring the issue, the alarm is cleared. If the alarm cannot be cleared, it can be escalated to an incident or handled through WarRoom requests. This creates a standardized alarm handling process to avoid abnormal alarm handling.

The standardized incident handling process includes the following steps:

  1. Integrate and manage access to raw alarm data.
  2. Configure incident forwarding rules to clean and process alarms.
  3. Configure notification templates, select notification objects and methods in the notification management according to the notification scenario.
  4. Handle or convert alarms in the integrated alarm system.
  5. The incident center handles alarms that are converted into incidents, which can be forwarded, escalated, deescalated, or handled through WarRoom requests.

Prerequisites

An application group has been created on the application management page.

Personnel information has been added on the personnel management page.

A shift has been created on the scheduling management page.

Step 1: Integrate and Manage Access to Raw Alarm Data

  1. Log in to COC.
  2. In the navigation tree on the left, choose Incident Management > Data Source Integration.
  3. On the displayed page, select the data source to be accessed based on service requirements and click Access integration.
    Figure 1 Clicking Access integration
  4. On the displayed page, copy the endpoint URL.
    Figure 2 Endpoint URL
  5. Switch to the SMN console. In the navigation pane on the left, choose Topic Management > Subscriptions. On the displayed page, locate a desired description and click Add Subscription in the Operation column. In the displayed dialog box, click Select Topic to select a topic for Topic Name, set Protocol to HTTPS, paste the copied endpoint URL to Endpoint, and click OK.
    Figure 3 Add Subscription
  6. Log in to the Cloud Eye console. In the navigation pane on the left, choose Alarm Management > Alarm Rules. On the displayed page, click Create Alarm Rule, enable Alarm Notification, and select Topic subscription for Notification Recipient.
    Figure 4 Creating an alarm
  7. Return to COC, confirm the integration, and click Integrate.
    Figure 5 Clicking Integrate

Step 2: Create a Forwarding Rule to Clean Raw Alarm Data

  1. Log in to COC.
  2. In the navigation pane on the left, choose Incident Management > Incident Forwarding Rules.
  3. In the upper part of the list, click Create Incident Forwarding Rule.
    Figure 6 Creating a forwarding rule
  4. Enter basic information such as the rule name and application name as prompted.
  5. In the Trigger Rules area, select a trigger type, select a monitoring source for Data Source, and set triggering conditions and trigger criteria.
    Figure 7 Trigger criteria
  6. You can configure a response plan for the corresponding incident or alarm in the forwarding rule. You can select scripts or jobs.
    Figure 8 Response plan
  7. In the Assignment Details area, select an owner and click Submit.
    Figure 9 Assignment Details area

Step 3: Configure the Notification Scenario, Recipient, and Method

  1. Log in to COC.
  2. In the navigation pane on the left, choose Basic Configurations > Notification Management. On the displayed page, click Create Notification.
    Figure 10 Clicking Create Notification
  3. In the displayed dialog box, set the parameters based on Table 1 and click OK.
    Figure 11 Clicking OK
    Table 1 Notification parameters

    Parameter

    Mandatory

    Radio Button/Checkbox

    Description

    Name

    Yes

    /

    Name of a notification instance. Fuzzy search can be performed based on the notification name.

    Type

    Yes

    Radio button

    Level-1 category of incident notifications, which is classified by application type.

    Template

    Yes

    Checkbox

    Notification content template, which is built in the system. The template list varies depending on the notification type. After a template is selected, the template is displayed when you hover the cursor over it.

    Notification Scope

    Yes

    Checkbox

    When you select a service, such as Service A, and the incident ticket also indicates Service A without considering other matching rules, the subscription instance will take effect and notifications will be sent based on that subscription instance.

    Recipient

    Yes

    If you select Shift, you can select a single scenario and multiple roles. If you select Individual, you can select multiple users.

    Recipient who receives notifications. When set to Shift, the notification module will automatically retrieve a list of personnel under the current schedule and send notifications to the corresponding individuals. When set to Individual, notifications will be sent directly to the corresponding individuals.

    Notification Rule

    /

    /

    For example, if the value of rule A is set to a, in an incident ticket, the value of rule A is a, not considering other matching rules, the subscription instance will take effect and a notification is sent based on the subscription instance. However, if the value of rule A in the incident ticket is b, the subscription instance will not take effect, and no notification is sent.

    Notification Rule - Level

    No

    Checkbox

    Level of an incident ticket. There are five levels: P1 to P5. For details about the incident ticket levels, see Incident Levels.

    Notification Rule - Incident Category

    No

    Checkbox

    Category of an incident ticket. Multiple options are available.

    Notification Rule - Source

    No

    Checkbox

    Source of an incident ticket. Manual creation indicates that the incident ticket is created in the incident ticket center. Transfer creation indicates that the incident ticket is generated during the transfer.

    Notification Rule - Region

    No

    Checkbox

    Region of an incident ticket. Multiple regions can be selected.

    Method

    Yes

    Checkbox

    Notification channel

Step 4: Handle Alarms

  1. Log in to COC.
  1. In the navigation pane on the left, choose Incident Management > Alarms.
  2. In the alarm list, clear alarms, convert alarms to incidents, handle alarms, and view historical alarms.
    Figure 12 Alarm list
  3. On the automatic alarm handling page, you can select scripts or jobs and select target instances for automatic alarm handing.
    Figure 13 Automatic alarm handling
  4. Click Convert Alarms to Incidents. In the displayed dialog box, set fields such as Application, Incident Level, and Owner, and click OK. The system will send notifications to the owner according to the notification rule.
    Figure 14 Converting an alarm to an incident
  5. Click Clear to clear the current alarm. The notification alarm is then displayed on the Historical Alarms tab.
    Figure 15 Clearing alarms

Step 5: Convert Alarms to Incidents

  1. Log in to COC.
  2. In the navigation pane on the left, choose incident Management > Incident Center. On the displayed page, click the Pending tab and click the incident ticket number to access the incident details page.
    Figure 16 Clicking an incident ticket number
  3. Click Acknowledge.
    Figure 17 Clicking Acknowledge
  4. Click Change Owner.
    Figure 18 Clicking Change Owner
  5. Enter the forwarding information and click Submit.
    Figure 19 Entering forwarding information
  6. Click Upgrade/Downgrade.
    Figure 20 Clicking Upgrade/Downgrade
  7. Enter the upgrade or downgrade information and click Submit.
    Figure 21 Entering upgrade and downgrade information
  8. Click Start WarRoom.
    Figure 22 Clicking Start WarRoom
  9. Enter war room information and click Submit.
    Figure 23 Entering war room information
  10. Click Handle Incident.
    Figure 24 Clicking Handle Incident
  11. Enter incident handling information and click Submit.
    Figure 25 Entering incident handling information
  12. Click Verify Incident Closure.
    Figure 26 Clicking Verify Incident Closure
  13. Enter verification information and click OK.
    Figure 27 Entering verification information

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback