ALM-12207 Slow Disk Processing Timeout
Alarm Description
When slow disk detection is enabled, the system checks the slow disk processing status every 10 minutes by default. This alarm is generated when the following disk or node status does not change within 10 hours.
Disk: Automatic isolation aborted, isolated, isolation failed, and de-isolation failed.
Node: Isolated, Isolation failed, Isolation cancellation failed, Node startup failed, and De-isolated.
This alarm is automatically cleared when the status of the node or disk that is in the processing timeout state changes.
This alarm applies only to MRS 3.3.1 or later.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
12207 |
Major |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster or system for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
|
RoleName |
Specifies the role for which the alarm was generated. |
|
HostName |
Specifies the host for which the alarm was generated. |
|
DiskName |
Specifies the disk for which the alarm was generated. |
|
Additional Information |
HostName |
Specifies the host for which the alarm was generated. |
DiskName |
Specifies the disk for which the alarm was generated. |
|
Details |
Specifies that the description of slow disk isolation. |
Impact on the System
If an isolated disk or node cannot be restored in a timely manner, the running of components may be affected, which further affects user services.
Possible Causes
The isolation status of the disk or node exceeds the configured timeout period for processing slow disks.
Handling Procedure
Check the cause of the slow disk processing timeout.
- Log in to FusionInsight Manager and choose O&M > Alarm > Alarms. In the alarm list, expand the alarm details, and view and record the host or disk for which the alarm is generated.
- Log in to the active OMS node as user root and run the following command to check the cause of slow disk processing timeout in the controller log and check whether there is obvious error information:
vi /var/log/Bigdata/controller/controller.log
- Log in to the node for which the alarm is generated as user root and run the following command to check the cause of slow disk processing timeout in the agent log and check whether any error information is displayed:
vi /var/log/Bigdata/nodeagent/agentlog/agent.log
- Contact O&M engineers to rectify the fault and manually run the command for the slow disk or node. After the command is executed, observe for 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Select Controller and NodeAgent for Service, select the active/standby OMS node and the node for which the alarm is generated in the Host area, and click OK.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot