ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold

Alarm Description

The system checks the percentage of the HQL statements that are executed successfully in every 30 seconds. The formula is: Percentage of HQL statements that are executed successfully = Number of HQL statements that are executed successfully by Hive in a specified period/Total number of HQL statements that are executed by Hive. This indicator can be viewed on the Cluster > Services > Hive > Instance > HiveServer instance . By default, a threshold is provided for the percentage of successful HQL executions. This alarm is generated when the percentage of successful HQL executions is less than the threshold. Users can view the name of the host where an alarm is generated in the location information about the alarm. The IP address of the host is the IP address of the HiveServer node.

Users can modify the threshold by choosing O&M > Alarm > Thresholds > Name of the desired cluster > Hive > Percentage of HQL Statements That Are Executed Successfully by Hive.

This alarm is cleared when the execution success rate is higher than 110% of the threshold.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
16002	Critical (default threshold: 90%) Major (default threshold: 80%)	Quality of service	Hive	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

16002

Critical (default threshold: 90%)

Major (default threshold: 80%)

Quality of service

Hive

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Trigger Condition	Specifies the threshold for triggering the alarm.

Impact on the System

The service execution capability of the system is too low and cannot properly respond to customer requests. The Hive service is not affected. You need to check HiveServer logs to locate the SQL failure cause.

Possible Causes

A syntax error occurs in HQL statements.
The HBase service is abnormal when a Hive on HBase task is performed.
The Spark service is abnormal when a Hive on Spark task is performed.
The dependent basic services, such as HDFS, Yarn, and ZooKeeper, are abnormal.

Handling Procedure

Check whether the HQL statements comply with syntax.

On the FusionInsight Manager page, choose O&M > Alarm to view the alarm details and obtain the node where the alarm is generated.
Use the Hive client to log in to the HiveServer node where an alarm is reported. Query the HQL syntax provided by Apache, and check whether the HQL commands are correct.
- If yes, go to 4.
- If no, go to 3.
To view the user who runs an incorrect statement, you can download the hiveserver audit log file of the HiveServer node where this alarm is generated. Start Data and End Data are 10 minutes before and after the alarm generation time respectively. Open the log file and search for the Result=FAIL keyword to filter the log information about the incorrect statement, and then view the user who runs the incorrect statement according to UserName in the log information.
Enter the correct HQL statements, and check whether the command can be properly executed.
- If yes, go to 12.
- If no, go to 4.

Check whether the HBase service is abnormal.

Check whether an Hive on HBase task is performed with the user who runs the HQL command.
- If yes, go to 5.
- If no, go to 8.
On the FusionInsight Manager page, click Cluster > Name of the desired cluster > Services, check whether the HBase service is normal in the service list.
- If yes, go to 8.
- If no, go to 6.
Choose O&M > Alarm, check the related alarms displayed on the alarm page and clear them according to related alarm help.
Enter the correct HQL statements, and check whether the command can be properly executed.
- If yes, go to 12.
- If no, go to 8.

Check whether the HDFS, Yarn, and ZooKeeper are normal.

On the FusionInsight Manager portal, click Cluster > Name of the desired cluster > Services.
In the service list, check whether the services, such as HDFS, Yarn, and ZooKeeper are normal.
- If yes, go to 12.
- If no, go to 10.
Check the related alarms displayed on the alarm page and clear them according to related alarm help.
Enter the correct HQL statements, and check whether the command can be properly executed.
- If yes, go to 12.
- If no, go to 13.
After 1 minute, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 13.

Collect fault information.

On the FusionInsight Manager home page, choose O&M > Log > Download.
Select the following nodes in the required cluster from the Service:
- MapReduce
- Hive
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M engineers and send the collected logs.