Updated on 2024-11-20 GMT+08:00

Full-Link Fault Diagnosis

Scenarios

After an incident is created, you can use the full-link fault diagnosis function to quickly locate the root cause of the fault. We provide the relationship topology of the application layer, component layer, and resource layer for customer applications, implement exception coloring based on resource and application alarms, and provide the capabilities of viewing core resource metrics and diagnosing instances.

Prerequisites

  • CES has been connected. You can configure CES monitoring by referring to Integration Management.
  • An incident ticket has been created.
  • To display workload and POD information in a CCE cluster, you need to add label to workloads in CCE. (Only one CCE cluster resource can be added to each group. Otherwise, workload information is not displayed.)
    Figure 1 Configuring CCE workload label

Procedure

  1. Log in to COC.
  2. In the navigation pane, choose Fault Management > Incidents, click the All Incident Tickets tab, click an incident name to go to the Incident Details page, and click the Application Diagnostics tab.
  3. Select a fault time range to color the alarms generated in this time range. You can enter the end time in the time box. The start time is one hour earlier than the end time. The time axis can be automatically refreshed. After Auto Refresh is selected, the end time is automatically refreshed to the latest time based on the refresh frequency.

    Figure 2 Selecting fault time range

  4. By default, all sub-applications of the current application are displayed on the application topology screen.

    Figure 3 Application topology (application layer)

  5. Click a sub-application in the topology to view the component layer. All components of the sub-application are displayed. You can switch to other sub-applications on the top to view their components.

    Figure 4 Application topology (component layer)

  6. Click a component to view the resource layer. All resources under the component are displayed, and metrics of core cloud services are displayed. If APM is associated in application management, you can also view link-related metrics.

    Figure 5 Application topology (resource layer)

  7. Click the Alarm tab to view application alarms. The list displays the alarms generated within the time range. After a topology object is selected on the left, the alarm information of the selected object is automatically filtered out.

    Figure 6 Alarm list

  8. Click the Change tab to view application changes. The list displays the changes within the time range.

    Figure 7 Changes

  9. Click the Diag tab and click Create Diag to diagnose DCS, RDS, and DMS resources of an application. After a topology object is selected on the left, the diagnosis information of the selected object is automatically filtered out.

    Figure 8 Creating a diagnosis task

  10. After the diagnosis is complete, click View Details in the diagnosis result list to view the diagnosis report.

    Figure 9 Diagnosis report