Updated on 2025-05-22 GMT+08:00

RES12-04 Recovering from Faults Immediately

If an application system becomes faulty, locate and rectify faults as soon as possible.

  • Risk level

    High

  • Key strategies

    Use the following methods to quickly detect faults:

    • Monitoring: An application system must offer service monitoring data so that the maintenance team can track its status. Specially-assigned maintenance personnel should monitor the system and rectify faults quickly.
    • Alarms: After detecting a fault, an application system must generate an alarm and send it to all related personnel by SMS message or email, so that the personnel can quickly respond to the fault.
    • Prediction: The maintenance team needs to predict system risks based on the system status using methods such as data analytics and machine learning, and prevent and handle the risks in advance.

    During emergency recovery, the maintenance team needs to mitigate risks or recover services as soon as possible to quickly eliminate the impacts of service interruptions on customers, and then start locating and rectifying faults to shorten service interruptions.

    • Organization coordination: If there is a fault, the emergency recovery chairperson needs to quickly organize related personnel to recover services.
    • Emergency recovery: If there is a fault, the assigned O&M owner needs to quickly analyze the fault and recover the system based on the emergency response plan.