Updated on 2025-08-08 GMT+08:00

Managing Affected Applications

Scenarios

If an application is affected when a fault occurs, you can add the affected application in the war room details. You can use the application diagnosis function to check affected application details and execute contingency plans to quickly restore applications.

Adding an Affected Application

You can add affected applications when starting war rooms, locating faults, and rectifying faults.

  1. Log in to COC.
  2. In the navigation pane on the left, choose Fault Management > WarRoom.
  3. Click the title of the war room you want to modify.
  4. Click Add Affected Application.
  5. Set parameters for adding an affected application.

    Table 1 Parameters for adding an affected application

    Parameter

    Description

    Affected Application

    Select an affected application from the drop-down list.

    Start Time

    Enter the time when the application starts to be affected.

    The default value is the time when the war room is created. The start time cannot be later than the war room creation time.

    Recovery Time

    (Optional) Enter the application recovery time.

    The recovery time cannot be earlier than the war room creation time.

    Description

    Enter the impact description of the application.

    The value can contain a maximum of 500 characters.

  6. Click OK.

    The affected application is added. You can click Affected Application to view its alarms, incidents, and changes.

Executing a Contingency Plan

  1. Log in to COC.
  2. In the navigation pane on the left, choose Fault Management > WarRoom.
  3. Click the title of the war room you want to modify.
  4. Select the application to be handled and click Execute Plan.
  5. If you select Contingency Plans, select the corresponding contingency plan from the drop-down list and click Execute.

    If no appropriate contingency plans are available, create one. For details, see Creating a Custom Contingency Plan.

  6. Check the task type associated with the contingency plan.

    • If the task type is Scripts, go to 7.
    • If the task type is Jobs, go to 8.

  7. Set Execute Scripts.

    • The parameter names and default values have been preset when the custom script is created.
    • Executed By: root is set by default. It is the user who executes the script on a target instance node.
    • Timeout Interval: 300 is set by default. It indicates the timeout interval for executing the script on a single target instance.
    • Target Instance: Click Add and set Select Instance.
      Table 2 Instance parameters

      Parameter

      Description

      Example Value

      Selection Method

      Select an instance selection method.

      • Manual Selection: Manually select an instance based on Enterprise Project, View Type, Resource Type, Region, and Target Instance.

      Manual Selection

      Enterprise Project

      Select an enterprise project from the drop-down list. You can choose All.

      All

      View Type

      Select a view type.

      • CloudCMDB resources: Select an instance from the resource list.
      • CloudCMDB application groups: Select an instance from the application group list.

      CloudCMDB resources

      Resource Type

      The value can be ECS or BMS.

      ECS

      Region

      Select a region from the drop-down list.

      CN-Hong Kong

      Target Instance

      Set filter criteria in the filter box and select the filtered instances.

      -

    • Batch Policy: Select Automatic, Manual, or No Batch.
      • Automatic: The selected instances to be executed are automatically divided into multiple batches based on the preset rule.
      • Manual: You can manually create multiple batches and add instances to each batch as required.
      • No Batch: All instances will be executed in the same batch.
    • Suspension Policy:
      • You can set the execution success rate. When the number of failed hosts reaches the number failed ones that are calculated based on the execution success rate, the service ticket status becomes abnormal and the service ticket stops being executed.
      • The success rate ranges from 0 to 100 and supports accuracy up to one decimal place.

      Skip step 8 and perform step 9.

  8. Set Execute Jobs.

    • Region: Select the region where the target instance is located.
    • Target Instance Mode: Select the execution mode of job step and target instances.
      • Consistent for all steps: All tasks are executed on the selected instance using the same batch policy.
      • Unique for each step: Tasks in one step are executed on the selected instance. Each step uses a batch policy.
      • Unique for each task: Set the target instance and batch policy for each task.
    • Job Execution Procedure: Customize job details.
      • Click the job name. The Modifying Parameters drawer is displayed on the right.
      • Set Input, Output, and Troubleshooting.
    • Target Instance: Click Add and set Select Instance.
      Table 3 Instance parameters

      Parameter

      Description

      Example Value

      Selection Method

      Select an instance selection method.

      • Manual Selection: Manually select an instance based on Enterprise Project, View Type, Resource Type, Region, and Target Instance.

      Manual Selection

      Enterprise Project

      Select an enterprise project from the drop-down list. You can choose All.

      All

      View Type

      Select a view type.

      • CloudCMDB resources: Select an instance from the resource list.
      • CloudCMDB application groups: Select an instance from the application group list.

      CloudCMDB resources

      Resource Type

      The value can be ECS or BMS.

      ECS

      Region

      The default parameter cannot be modified and is determined by Region in Execution Content.

      CN-Hong Kong

      Target Instance

      Set filter criteria in the filter box and select the filtered instances.

      -

    • Batch Policy: Select Automatic, Manual, or No Batch.
      • Automatic: The selected instances to be executed are automatically divided into multiple batches based on the preset rule.
      • Manual: You can manually create multiple batches and add instances to each batch as required.
      • No Batch: All instances will be executed in the same batch.

  9. Click OK.
  10. Perform the following operations to check whether a service ticket execution is complete.

    • For the service tickets that are being executed:
      • If you want to pause the next batch when the current batch is executed, click Pause in the upper right corner.
      • If you want to continue the paused batch, click Continue in the upper right corner.
      • If you want to stop the service ticket that is about to be executed or is abnormal, click Forcibly End.
    • For the service tickets that are executed:
      • If some or all instance tasks in the service tickets are executed abnormally:
        1. Click the Abnormal tab in the Execution Information area. Locate an abnormal batch and click Retry in the Operation column.
        2. Click the Abnormal tab in the Execution Information area. Locate an abnormal batch and click Cancel in the Operation column.
      • If all instance tasks in the service tickets are executed successfully, no more operation is needed.

Diagnosing Applications

  1. Log in to COC.
  2. In the navigation pane on the left, choose Fault Management > WarRoom.
  3. Click the title of the war room you want to diagnose.
  4. Select the application to be handled and click Application Diagnosis.
  5. Click the time box and set the fault occurrence time.

    The time entered in the time box is the end time. The start time is one hour earlier than the end time. After the time is selected, the number of alarms for the application and its sub-applications in the selected time period is displayed on the application topology dashboard, and the application fault details are displayed on the details page on the right.

  6. (Optional) Select Auto Refresh and select a refresh frequency from the drop-down list.

    After Auto Refresh is selected, the end time is updated to the current system time based on the refresh frequency.

  7. (Optional) If the application has sub-applications, click the target sub-application.

    The application topology dashboard displays all components of the sub-application. The sub-application fault details are displayed on the details page on the right. You can switch to other sub-applications on the topology dashboard.

  8. Click a component under the application or its sub-application.

    The application topology dashboard displays all resources of the component. The component fault details are displayed on the right details page. You can switch to other components on the topology dashboard. Metrics of core cloud services can be displayed. If APM is associated in application management, you can also view link-related metrics.

  9. Click Alarm on the right of the application topology.

    View application alarms. Alarms generated within the time range on the right axis are displayed in the list. When you select a topology object on the left, its alarm information is automatically filtered.

  10. Click Change on the right of the application topology.

    View application changes. Changes within the change time range on the right axis are displayed in the list.

  11. Click Fault Diagnosis on the right of the application topology.

    View the fault diagnosis data for your resources. You can check DCS, RDS, DMS, ECS, and ELB resources. After a topology object is selected on the left, its diagnosis information is automatically filtered.

    If no diagnosis task exists or you have created a new one, do the following:

    1. Click Create Diagnosis Task.
    2. Select a resource type and resource.
    3. Click OK.
    4. Read and agree to Frontend Data Authorization Agreement on Guest OS Diagnosis Service, and click Agree.

      You need to sign the agreement only if you select ECSs for fault diagnosis.

    After the diagnosis is complete, click View Details on the right of the diagnosis result list to view the diagnosis report.