Updated on 2024-04-11 GMT+08:00

Configuring Hive SQL Inspection

Scenario

You can configure rules for Hive SQL inspection on FusionInsight Manager and configure rule parameters as you need.

Prerequisites

  • The cluster client that contains the Hive service has been installed in the /opt/hadoopclient directory.
  • The Hive service of the cluster is running properly.
  • For a cluster with Kerberos authentication enabled, a user with Hive operation permissions has been created.

Constraints

  • By default, SQL inspection rules need 5 seconds to take effect dynamically. After the queue is modified, it takes 10 minutes for Hive inspection rules to be reloaded.
  • Interception and blocking rules will interrupt SQL tasks, so you need to set parameters of these rules properly based on the site requirements.
  • For the rule dynamic_0001 (the number of files scanned by SQL statements exceeds the threshold), when the Spark and Tez engines reach the threshold, interception logs are printed in Yarn task logs and cannot be output on the Beeline client.
  • Blocking rules have execution latency. For example, if the running_0004 rule is used and the threshold of the scanned data volume is 10 GB, the statement may be blocked when the data volume is 15 GB or higher due to the determination period and task concurrency.

Procedure

  1. Log in to FusionInsight Manager, click Cluster, and choose SQL Inspector. The SQL Inspector page is displayed.
  2. Add rules for Hive by referring to Adding an SQL Inspection.

    For details about the rules supported by the Hive SQL engine, see MRS SQL Inspection Rules.

    For example, add a rule whose ID is static_0001 to check whether count distinct appears more than two times in the SQL statement. If so, the system displays a hint.

    Figure 1 Adding a Hive SQL inspection rule

  3. Log in to the node where the Hive client is installed and run the following command to switch to the client installation directory.

    cd /opt/hadoopclient

    Run the following command to set environment variables:

    source bigdata_env

    Run the following command to authenticate the current user. Skip this step if Kerberos authentication is disabled for the cluster (the cluster is in normal mode).

    kinit Component service user who has the Hive operation permission

  4. Run the following command to log in to the Hive client:

    beeline

  5. Run the following commands to create a table and import data to the table.

    drop table if exists hivetb;

    create table hivetb(a int,b int);

    insert into hivetb select 1,11;

    insert into hivetb select 2,22;

  6. Run the following SQL statement to check whether the current rule takes effect:

    select count(distinct a),count(distinct b) from hivetb;

    If the number of times count distinct appears in the statement exceeds the threshold configured in 2, the following information is displayed:

    ...
    WARN  : STATIC_0001 The count(distinct X) times exceeds the limit : 2, current count distinct times : 2
    ...
    If the operation set in the rule is Block, the statement fails to be executed and the following information is displayed:
    ...
    Error: Error while compiling statement: FAILED: RuleException STATIC_0001 The count(distinct X) times exceeds the limit : 2, current count distinct times : 2 (state=42000,code=40000)
    ...
    • For more Hive SQL inspection rules, see MRS SQL Inspection Rules.
    • You can also obtain the SQL inspection rules via logs which are stored in /var/log/Bigdata/audit/hive/hiveserver/queryinfo.log.