Updated on 2024-04-11 GMT+08:00

ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold (For MRS 2.x or Earlier)

Description

The system checks the percentage of the HiveQL statements that are executed successfully every 30 seconds. Percentage of HiveQL statements that are executed successfully = Number of HiveQL statements that are executed successfully by Hive in a specified period/Total number of HiveQL statements that are executed by Hive. This indicator can be viewed on the Hive service monitoring page. This alarm is generated when the percentage of the HiveQL statements that are executed successfully exceeds the specified threshold (90% by default). The name of the host for which the alarm is generated can be obtained from the location information of the alarm. The host IP address is the IP address of the HiveServer node.

This alarm is cleared when the percentage of the HiveQL statements that are executed successfully in a test period is less than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Auto Clear

16002

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Trigger condition

Specifies the threshold for triggering the alarm.

Impact on the System

The system configuration and performance cannot meet service processing requirements.

Possible Causes

  • A syntax error occurs in HiveQL commands.
  • The HBase service is abnormal when a Hive on HBase task is being performed.
  • Basic services that are depended on are abnormal, such as HDFS, Yarn, and ZooKeeper.

Procedure

  1. Check whether the HiveQL commands comply with syntax.

    1. Use the Hive client to log in to the HiveServer node for which the alarm is generated. Query the HiveQL syntax standard provided by Apache, and check whether the HiveQL commands are correct.
      • If yes, go to 2.a.
      • If no, go to 1.b.

      To view the user who runs an incorrect statement, download HiveServerAudit logs of the HiveServer node for which this alarm is generated. Set Start time and End time to 10 minutes before and after the alarm generation time respectively. Open the log file and search for the Result=FAIL keyword to filter the log information about the incorrect statement, and then view the user who runs the incorrect statement according to UserName in the log information.

    2. Enter correct HiveQL statements, and check whether the command can be properly executed.
      • If yes, go to 4.e.
      • If no, go to 2.a.

  2. Check whether the HBase service is abnormal.

    1. Check whether a Hive on HBase task is performed.
      • If yes, go to 2.b.
      • If no, go to 3.a.
    2. Check whether the HBase service is normal in the service list.
      • If yes, go to 3.a.
      • If no, go to 2.c.
    3. Check the alarms displayed on the alarm page and clear them according to Alarm Help.
    4. Enter correct HiveQL statements, and check whether the command can be properly executed.
      • If yes, go to 4.e.
      • If no, go to 3.a.

  3. Check whether the Spark service is abnormal.

    1. Check whether the Spark service is normal in the service list.
      • If yes, go to 4.a.
      • If no, go to 3.b.
    2. Check the alarms displayed on the alarm page and clear them according to Alarm Help.
    3. Enter correct HiveQL statements, and check whether the command can be properly executed.
      • If yes, go to 4.e.
      • If no, go to 4.a.

  4. Check whether HDFS, Yarn, and ZooKeeper are normal.

    1. Go to the MRS cluster details page and click Components.
    2. In the service list, check whether the services, such as HDFS, Yarn, and ZooKeeper are normal.
      • If yes, go to 4.e.
      • If no, go to 4.c.
    3. Check the alarms displayed on the alarm page and clear them according to Alarm Help.
    4. Enter correct HiveQL statements, and check whether the command can be properly executed.
      • If yes, go to 4.e.
      • If no, go to 5.
    5. Wait one minute and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 5.

  5. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None