Updated on 2024-11-20 GMT+08:00

Failure Modes

A failure mode refers to a specific type of problem or failure status that may occur during application running. Build a rich failure mode library and formulate corresponding prevention and recovery measures to help design a more highly available application system. By identifying potential faults, you can perform routine drills to verify whether the fault recovery measures and fault impacts meet the expectations and prepare for better response to various challenges.

Scenarios

You can analyze the possible fault points of an application, create a failure mode by describing the fault occurrence conditions, fault symptoms, and customer impacts, and apply the failure mode to routine chaos drills.

Precautions

Check whether the enterprise project, application, event level, and scenario category of the failure mode are correct.

Procedure

  1. Log in to COC.
  2. In the navigation pane on the left, choose Resilience Center > Chaos Drill. On the displayed page, click Risk Management Tasks, and click Failure Modes. On the displayed page, click Create Failure Mode.

    Figure 1 Failure Modes

  3. Enter the failure mode information by referring to Table 1.

    Figure 2 Creating a failure mode
    Table 1 Failure mode parameters

    Parameter

    Description

    Failure Mode

    Custom failure mode name

    Enterprise Project

    Enterprise project to which the failure mode resource belongs. default is the preset value.

    Application

    Application to which the drill target belongs

    Incident Level

    For details about the incident level, see Creating an Incident.

    Source

    Including Failure modes detected proactively and Existing failure modes.

    Contingency Plan Available

    Yes or No. The default value is Yes.

    Contingency Plan Available

    Select a contingency plan from the drop-down list box. If no plan is available, create one. For details, see Emergency Plan.

    Scenario Category

    Failure scenario, including redundancy, disaster recovery, overload, configuration, and dependency

    Occurrence Conditions

    Possible conditions that cause the failure

    Fault Symptom

    Service symptom when the failure occurs

    Impact on Customer

    Failure impact on customers

  4. Select whether a contingency plan is provided. If you select Yes, select a contingency plan name from the drop-down list. If no contingency plan is available, create a contingency plan and click OK.