RES06-03 Subhealth Detection
Components in a system can be fully faulty or in a subhealth state. Subhealth indicates that the overall service of a system remains within the threshold, but the service of some instances exceeds the threshold. Subhealth is a relative concept, which compares current performance with past data or overall system performance. Therefore, the detection and determination of subhealth vary. If subhealth is detected, a system needs to be isolated or recovered promptly to prevent service disruptions.
- Risk level
High
- Key strategies
Subhealth detection predicts system faults based on subhealth symptoms. A typical example is memory leakage. Memory leakage does not immediately cause system failures. The system becomes slow due to insufficient swap memory, and the memory usage keeps increasing. Therefore, monitoring the memory usage of instances is necessary. If the memory usage exceeds the threshold, an alarm is generated, and manual intervention is required to quickly rectify the fault, preventing service interruptions.
Typical subhealth scenarios include packet loss or errors, hard disk performance deterioration, and CPU or memory overload. If a component in an application system is in subhealth state, the service success rate of the application system may decrease.
Subhealth is not a fault. Therefore, thresholds are set for service monitoring metrics. When a metric exceeds the threshold, an alarm is generated and recovery is required.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot