Updated on 2024-10-25 GMT+08:00

About Hive User Permissions

Hive is a data warehouse framework built on Hadoop. It provides basic data analysis services using the Hive query language (HQL), a language like the structured query language (SQL).

MRS supports users, user groups, and roles. Permissions must be assigned to roles and then roles are bound to users or user groups. Users can obtain permissions only by binding a role or joining a group that is bound with a role. For details about Hive authorization, visit https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization.

  • Hive permissions in security mode need to be managed whereas those in normal mode do not.
  • If the current component uses Ranger for permission control, you need to configure permission management policies based on Ranger. For details, see Adding a Ranger Access Permission Policy for Hive.

Hive Permission Model

To use the Hive component, users must have permissions on Hive databases and tables (including external tables and views). In MRS, the complete Hive permission model is composed of Hive metadata permission and HDFS file permission. The Hive permission model also includes the permission to use databases or tables.

  • Hive metadata permission

    Similar to traditional relational databases, the Hive database of MRS supports the CREATE and SELECT permission, and the Hive tables and columns support the SELECT, INSERT, and DELETE permissions. Hive also supports the permissions of OWNERSHIP and Hive Admin Privilege.

  • Hive data file permission, also known as HDFS file permission

    Hive database and table files are stored in the HDFS. The created databases or tables are saved in the /user/hive/warehouse directory of the HDFS by default. The system automatically creates subdirectories named after database names and database table names. To access a database or a table, the corresponding file permissions (read, write, and execute) on the HDFS are required.

To perform various operations on Hive databases or tables, you need to associate the metadata permission with the HDFS file permission. For example, to query Hive data tables, you need to associate the metadata permission SELECT and the HDFS file permissions Read and Write.

To use the role management function of Manager GUI to manage the permissions of Hive databases and tables, you only need to configure the metadata permission, and the system will automatically associate and configure the HDFS file permission. In this way, operations on the interface are simplified, and the efficiency is improved.

Hive Users

MRS provides users and roles to use Hive, such as creating tables, inserting data into tables, and querying tables. Hive defines the USER class, corresponding to user instances. Hive defines the GROUP class, corresponding to role instances.

You can use Manager to set permissions for Hive users. This method only supports permission setting in roles. A user or user group can obtain the permissions only after a role is bound to the user or user group. Hive users can be granted Hive administrator permissions and permissions to access databases, tables, and columns.

Support for Cascading Authorization (Available in MRS 3.3.0 or Later).

Hive tables in a cluster with Ranger authentication enabled support cascading authorization, which significantly improves the authentication usability. You only need to authorize for service tables once on the Ranger page, and the background automatically associates the permissions of the data storage source in a fine-grained manner without detecting the storage path of the tables and without requiring secondary authorization. This also eliminates the disadvantage of authorization based on decoupled storage and compute. For details, see Hive Tables Supporting Cascading Authorization.

Hive Usage Scenarios and Related Permissions

Creating a database with Hive requires users to join in the hive group, without granting a role. Users have all permissions on the databases or tables created by themselves in Hive or HDFS. They can create tables, select, delete, insert, or update data, and grant permissions to other users to allow them to access the tables and corresponding HDFS directories and files.

A user can access the tables or database only with permissions. The permission required by users varies according to Hive usage scenarios.

Table 1 Hive usage scenarios

Typical Scenario

Permission

Using Hive tables, columns, or databases

Permissions required in different scenarios are as follows:

  • To create tables, the CREATE permission is required.
  • To query data, the SELECT permission is required.
  • To insert data, the INSERT permission is required.
  • To delete data, the DELETE permission is required.

Associating and using other components

In addition to Hive permissions, permissions of other components are required in some scenarios, for example:

  • Yarn permissions are required when some HQL statements, such as insert, count, distinct, group by, order by, sort by, and join, are run. You are advised to grant Yarn permissions to the role of each Hive user.
  • HBase permission is required when Hive over HBase is used, for example, querying HBase table data in Hive.

In some special Hive usage scenarios, you need to configure other types of permission.

Table 2 Hive authorization precautions

Scenario

Permission

Creating Hive databases, tables, and external tables, or adding partitions to created Hive tables or external tables when data files specified by Hive users are saved to other HDFS directories except /user/hive/warehouse

The directory must already exist, the Hive user must be the owner of the directory, and the Hive user must have the read, write, and execute permissions on the directory. The user must have the read and write permissions of all the upper-layer directories of the directory. After an administrator grants the Hive permission to the role, the HDFS permission is automatically granted.

Using load to load data from all the files or specified files in a specified directory to Hive tables as a Hive user

  • The data source is a Linux local disk, the specified directory exists, and the system user omm has read and execute permission of the directory and all its upper-layer directories. The specified file exists, and user omm has read permission of the file and has the read and execute permission of all the upper-layer directories of the file.
  • The data source is HDFS, the specified directory exists, and the Hive user is the owner of the directory and has read, write, and execute permission on the directory and its subdirectories, and has read and write permission on all its upper-layer directories. The specified file exists, and the Hive user is the owner of the file and has read, write, and execute permission, and has read and execute permission on the file and all its upper-layer directories.
NOTE:

When load is used to import data to a Linux local disk, files must be loaded to the HiveServer on which the command is run and the permission must be modified. You are advised to run the command on a client. The HiveSever to which the client is connected can be found. For example, if the Hive client displays 0: jdbc:hive2://10.172.0.43:21066/>, the IP address of the connected HiveServer is 10.172.0.43.

Creating or deleting functions or modifying any database

The Hive Admin Privilege is required.

Performing operations on all databases and tables in Hive

The user must be added to the supergroup user group and granted Hive Admin Privilege.

Enabling Ranger authentication when Kerberos authentication is disabled for the cluster (the cluster is in normal mode)

By default, Ranger authentication is disabled if Kerberos authentication is disabled for the cluster (the cluster is in normal mode). If Ranger authentication is disabled, there are the following restrictions:

  • Whitelist: If the whitelist function is enabled, you cannot set parameters on the client. You can disable the whitelist function by setting the hive.security.whitelist.switch parameter to OFF on the Hive configuration page. This poses security risks. Exercise caution when performing this operation.
  • The reflect, reflect2, java_method, and in_file functions cannot be executed. If required, add hive.server2.builtin.udf.blacklist to the custom parameters of HiveServer and set it to mpty_blacklist to allow Hive to execute these functions. This poses security risks exist. Exercise caution when performing this operation.