Help Center/ Cloud Operations Center/ Best Practices/ Handling Alarms Based on Fault Management on COC
Updated on 2026-03-12 GMT+08:00

Handling Alarms Based on Fault Management on COC

Scenarios

The incident handling process made by a certain intelligent customer service O&M engineer is inefficient due to the lack of standardized fault handling procedures, clear fault rectification joint collaboration teams, and contingency plans. Similar fault scenarios repeatedly occur, no O&M experience is accumulated, and deterministic fault scenarios cannot be automatically restored. There are multiple severities of alarms, but the processing of alarms lacks standardized procedures and is relatively slow. It is necessary to establish a standardized incident process to achieve standardized processing.

Solutions

End-to-end incident handling process: Clearly define standardized incident handling procedures, achieve multi-operational collaboration through war rooms, and improve incident handling efficiency through contingency plans.

COC helps you manage alarms uniformly by setting up alarm conversion rules to convert raw alarms into incident or alarm tickets. When a raw alarm matches an alarm conversion rule, an incident/alarm is created, and the corresponding owner is notified according to the scheduling management. The owner can handle the alarm or convert it into an incident. After locating and restoring the issue, the alarm is cleared. If the alarm cannot be cleared, it can be escalated to an incident or handled through war rooms. This creates a standardized alarm handling process to avoid abnormal alarm handling.

The standardized incident handling process includes the following steps:

  1. Integrate and manage access to raw alarm data.
  2. Create an alarm conversion rule to clean raw alarm data.
  3. Configure notification templates, select notification objects and methods in the notification management according to the notification scenario.
  4. Handle or convert alarms in the integrated alarm system.
  5. The incident center handles alarms that are converted into incidents, which can be forwarded, escalated, de-escalated, or handled through war rooms.

Prerequisites

Step 1: Collecting Raw Alarm Data by Integrating Data Sources

  1. Log in to COC.
  2. In the navigation pane, choose Fault Management > Data Sources.
  3. On the displayed page, select the source you want to connect. In this example, select Cloud Eye and click Integrate.
  4. Click Integrate to confirm the connection to Cloud Eye.
    After the confirmation, Cloud Eye is moved from the To Be Integrated area to the Integrated area.
    Figure 1 Confirming the integration
  5. Configure parameters for accessing alarms.
    After the configuration is complete, the system receives the source data.
    Figure 2 Integrating data sources

Step 2: Create an Alarm Conversion Rule to Clear Raw Alarm Data

  1. In the navigation pane, choose Fault Management > Incident Forwarding Rules.
  2. On the displayed page, click Create.
  3. Enter basic information such as the rule name and application name as prompted.
  4. Set a trigger rule.
    This example describes only the mandatory parameters. Retain the preset values for other parameters.
    • Trigger Type: Alarm
    • Data Source: Cloud Eye
    • Triggering Conditions: Set it based on service requirements, for example, application.
    • Alarm Level: Minor
    Figure 3 Trigger criteria
  5. In the Assignment Details page, select the owner. In this example, select Shift.
    Select a shift scenario and corresponding roles from the drop-down lists. For details, see Shift Schedule Management.
    Figure 4 Assignment details
  6. Click OK.

Step 3: Configure the Notification Scenario, Recipient, and Method

  1. In the navigation pane, choose Basic Configuration > Notification Management.
  2. On the displayed page, click Create Notification.
    Figure 5 Creating a notification
  3. Configure the notification information and click OK. The following table describes the required parameters.

    This example describes only the mandatory parameters. Retain the preset values for other parameters. For details about the parameters, see Notification Management.

    Table 1 Parameters for creating a notification

    Parameter

    Example Value

    Name

    Customized notification name, for example, notification for alarm-into-incident.

    Type

    Select Incident Notification.

    Template

    Select Incident Creation.

    Recipient

    Select Ticket Owner. If this parameter is selected, the current owner of the service ticket will be notified.

    Method

    Select Email.

Step 4: Handle Aggregated Alarms

  1. Log in to COC.
  2. In the navigation pane, choose Fault Management > Alarms.
  3. Click the Aggregated Alarms tab. In the current alarm list, select the alarm to be handled.
    You can clear alarms, convert alarms to incidents, handle alarms, and view historical records.
    Figure 6 Aggregated alarms
  4. Click More > Handle in the Operation column. On the displayed page, select an existing script and job, and select the target instance for automatic handling.
    Figure 7 Automatic alarm handling
  5. Click Convert to Incident.
  6. Set Incident Level to P3, retain the preset values for other fields, and click OK.

    The system notifies the owner according to the notification rules.

Step 5: Convert Alarms to Incidents

  1. In the navigation pane, choose Fault Management > Alarms.
  2. Choose Aggregated Alarms > Unhandled Alarms. On the displayed page, click the incident ticket number to go to the incident details page.
    Figure 8 Clicking an incident ticket number
  3. Click Accept to accept the incident.
  4. If the service impact is severe, click Escalate/De-escalate to escalate the incident.
  5. Set the incident level to P2, enter the escalation or de-escalation information, and click OK.
    Figure 9 Entering escalation or de-escalation information
  6. To quickly restore services, start a war room.
  7. Enter the war room information. This example describes only the mandatory parameters. Retain the preset values for other parameters.
    Table 2 Parameters for starting a war room

    Parameter

    Description

    War Room Name

    Use the default incident ticket name.

    War Room Description

    Description of the war room.

    War Room Administrator

    Select a user from the drop-down list as the war room administrator.

    Shift

    Select a shift scenario and corresponding roles from the drop-down lists. For details about how to configure a shift, see Shift Schedule Management.

    Participant

    Select participants from the drop-down list. Multiple users can be selected.

  8. Click OK.

    You can add fault recovery members to the war room, send the fault progress to the personnel who are concerned about the fault in a timely manner, and use the application diagnosis and response plan to help quickly recover applications. For more operations, see War Room.

  9. After the fault is rectified, click Handle Incident in the upper right corner of the incident details page.
  10. Enter the incident handling details.
    Table 3 Parameters for handling incidents

    Parameter

    Description

    Incident Category

    (Mandatory) Select an incident category from the drop-down list, for example, configuration issue.

    Service Interrupted

    (Mandatory) Select Yes.

    Fault Occurred

    Enter the time when the fault occurs.

    Delimited Completion Time

    Enter the issue or fault locating completion time.

    Fault Rectification Time

    Set fault rectification time.

    Reason

    (Mandatory) Enter the cause of the incident. For example, "The service is not configured." is displayed. As a result, customer services are interrupted for 2 minutes.

    Solution

    (Mandatory) Enter the solution of the incident. For example, the configuration guide has been provided and the fault has been rectified.

    Add File

    Click Add File to upload incident-related attachments, such as the incident handling report and development verification description.

    A maximum of 10 files can be uploaded. The supported file types are JPG, PNG, DOCX, TXT, and PDF. The file size cannot exceed 10 MB.

  11. Click OK.

    The incident ticket status changes to Resolved and to be verified.

  12. Click Verify Incident Closure.
  13. In the dialog box that is displayed, enter the verification conclusion and description, and click OK.
    Figure 10 Entering verification information