ALM-29016 Impalad Instance in the Sub-healthy State
Alarm Description
In MRS 3.1.5, the system checks every 60 seconds whether the Hive Server2 HTTP port (28000) of Impalad responds to cURL requests. This alarm is generated when the returned result has been incorrect for 20 seconds in two consecutive times. This alarm is cleared when the system correctly responds within 20 seconds.
In other MRS versions, the system checks every 60 seconds whether Impalad can execute select 1. This alarm is generated when the returned result has been incorrect for 20 seconds in two consecutive times. This alarm is cleared when the SQL statement is correctly executed within 20 seconds.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
29016 |
Minor |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
|
RoleName |
Specifies the role for which the alarm was generated. |
|
HostName |
Specifies the host for which the alarm was generated. |
Impact on the System
Impalad cannot execute SQL statements or SQL statement execution times out, which affects data read and write.
Possible Causes
The Impalad service maintains too many queries.
Handling Procedure
- Log in to FusionInsight Manager and choose Cluster > Services > Impala > Impalad Web UI. On the displayed page, click any node to go to the web UI.
- On the web UI, click /backends to view the Impala instance list. Locate the instance for which the alarm is generated and click Web UI. After the web UI of the subhealthy node is displayed, click /queries to check the task execution status and check whether any task is executed slowly.
- After the task is complete, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 4.
- On FusionInsight Manager, choose Cluster > Services > Impala > Instances, select the Impala instance for which the alarm is generated, click More, and select Restart Instance. Then, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.
The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.
Collect fault information.
- On FusionInsight Manager of the active or standby cluster, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select Impala for the target cluster.
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot