RES12-04 Recovering from Faults Immediately
If an application system becomes faulty, locate and rectify faults as soon as possible.
- Risk level
High
- Key strategies
Use the following methods to quickly detect faults:
- Monitoring: An application system must offer service monitoring data so that the maintenance team can track its status. Specially-assigned maintenance personnel should monitor the system and rectify faults quickly.
- Alarms: After detecting a fault, an application system must generate an alarm and send it to all related personnel by SMS message or email, so that the personnel can quickly respond to the fault.
- Prediction: The maintenance team needs to predict system risks based on the system status using methods such as data analytics and machine learning, and prevent and handle the risks in advance.
During emergency recovery, the maintenance team needs to mitigate risks or recover services as soon as possible to quickly eliminate the impacts of service interruptions on customers, and then start locating and rectifying faults to shorten service interruptions.
- Organization coordination: If there is a fault, the emergency recovery chairperson needs to quickly organize related personnel to recover services.
- Emergency recovery: If there is a fault, the assigned O&M owner needs to quickly analyze the fault and recover the system based on the emergency response plan.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot