Updated on 2025-05-22 GMT+08:00

OPS08-02 Reviewing and Improving Events

Event analysis aims to standardize and guide event input and output after major events. It is used to guide the implementation of event backtracking, rectification of issues found in the backtracking report, and promotion of the summarized experience.

  • Risk level

    High

  • Key strategies

    To standardize recovery practices and improve availability and technical capabilities, after a critical live network fault was fixed, the recovery process is reviewed, the root causes are analyzed, and improvement measures are proposed. The postmortem process consists of review, analysis, summary, and action (RASA).

    • Review: All key links and roles in the recovery process root cause locating, decision-making, handling, contingency plan execution, rollback, and clearance.
    • Analysis: Root causes for faults and the potential for improvement in fault handling are analyzed.
    • Summary: The fault and the recovery process are summarized, including the fault nature, responsibility evaluation, and experience and lessons learned.
    • Action: Based on the summary, improvement measures are proposed and taken.