Hive Application Development Overview

Hive Introduction

Hive is an open-source data warehouse built on Hadoop. It stores structured data and provides basic data analysis services using the Hive query language (HQL), a language like the SQL. Hive converts HQL statements to Mapreduce or Spark jobs for querying and analyzing massive data stored in Hadoop clusters.

Hive provides the following features:

Extracts, transforms, and loads (ETL) data using HQL.
Analyzes massive structured data using HQL.
Supports flexible data storage formats, including JavaScript object notation (JSON), comma separated values (CSV), TextFile, RCFile, ORCFILE, and SequenceFile, and supports custom extensions.
Multiple client connection modes. Interfaces, such as JDBC and Thrift interfaces are supported.

Hive applies to offline massive data analysis (such as log and cluster status analysis), large-scale data mining (such as user behavior analysis, interest region analysis, and region display), and other scenarios.

To ensure Hive high availability (HA), user data security, and service access security, MRS incorporates the following features based on Hive 3.1.0:

Kerberos security authentication
Data file encryption
Complete rights management

For Hive features in the Open Source Community, seehttps://cwiki.apache.org/confluence/display/hive/designdocs.

Common Concepts

keytab file
The keytab file is a key file that stores user information. Applications use the key file for API authentication on MRS.
Client
Users can access the server from the client through the Java API and Thrift API to perform Hive-related operations.
HQL
Similar to SQL
HCatalog
HCatalog is a table information management layer created based on Hive metadata and absorbs DDL commands of Hive. HCatalog provides read/write interfaces for Mapreduce and provides Hive command line interfaces (CLIs) for defining data and querying metadata. Hive and Mapreduce development personnel can share metadata based on the HCatalog component of MRS, preventing intermediate conversion and adjustment and improving the data processing efficiency.
WebHCat
WebHCat running users use Rest APIs to perform operations, such as running Hive DDL commands, submitting Mapreduce tasks, and querying Mapreduce task execution results.

Required Permissions

**Table 1** Required permissions
Operation Type/Functional Object	Operation	Required Permission
DATABASE	CREATE DATABASE dbname [LOCATION "hdfs_path"]	If the HDFS path hdfs_pathis specified, the ownership and RWX permission ofhdfs_path are required.
	DROP DATABASE dbname	The database dbname ownership is required.
	ALTER DATABASE dbname SET OWNERuser_or_role	The admin permission is required.
TABLE	CREATE TABLE table_a	The create permission for the database is required.
	CREATE TABLE table_a AS SELECTtable_b	The create permission for the database and the select permission for table_b are required.
	CREATE TABLE table_a LIKEtable_b	The create permission for the database is required.
	CREATE [EXTERNAL] TABLE table_a LOCATION "hdfs_path"	The create permission for the database, and the ownership and RWX permission of hdfs_path on HDFS are required.
	DROP TABLE table_a	The ownership of table_a is required.
	ALTER TABLE table_a SET LOCATION "hdfs_path"	The ownership of table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.
	ALTER TABLE table_a SETFILEFORMAT	The ownership of table_a is required.
	TRUNCATE TABLE table_a	The ownership of table_a is required.
	ANALYZE TABLE table_a COMPUTE STATISTICS	The select and insert permission for table_a is required.
	SHOW TBLPROPERTIES table_a	The select permission for table_a is required.
	SHOW CREATE TABLE table_a	The select permission with grant option for table_a is required.
Alter	ALTER TABLE table_a ADD COLUMN	The ownership of table_a is required.
	ALTER TABLE table_a REPLACE COLUMN	The ownership of table_a is required.
	ALTER TABLE table_a RENAME	The ownership of table_a is required.
	ALTER TABLE table_a SET SERDE	The ownership of table_a is required.
	ALTER TABLE table_a CLUSTER BY	The ownership of table_a is required.
PARTITION	ALTER TABLE table_a ADD PARTITIONpartition_spec LOCATION "hdfs_path"	The insert permission for table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.
	ALTER TABLE table_a DROP PARTITIONpartition_spec	The delete permission for table_a is required.
	ALTER TABLE table_a PARTITIONpartition_spec SET LOCATION "hdfs_path"	The ownership of table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.
	ALTER TABLE table_a PARTITIONpartition_spec SET FILEFORMAT	The ownership of table_a is required.
LOAD	LOAD INPATH 'hdfs_path' INTO TABLE table_a	The insert permission for table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.
INSERT	INSERT TABLE table_a SELECT FROMtable_b	The update permission for table_aand select permission fortable_b are required.
SELECT	SELECT * FROM table_a	The select permission for table_a is required.
	SELECT FROM table_aJOINtable_b	The select permission for table_aandtable_b, the Submit permission of the default Yarn queue is required.
	SELECT FROM (SELECT FROM table_aUNION ALL SELECT FROMtable_b)	The select permission for table_aandtable_b, the Submit permission of the default Yarn queue is required.
EXPLAIN	EXPLAIN [EXTENDED\|DEPENDENCY] query	The RX permissions for related table directory is required.
VIEW	CREATE VIEW view_name AS SELECT ...	The select permission with grant option for related tables is required.
	ALTER VIEW view_name RENAME TOnew_view_name	The ownership of view_name is required.
	DROP VIEW view_name	The ownership of view_name is required.
INDEX	CREATE INDEX index_name ON TABLEbase_table_name (col_name, ...) ASindex_type	The ownership of table_a is required.
	DROP INDEX index_name ONtable_name	The ownership of index_name is required.
	ALTER INDEX index_name ONtable_name REBUILD	The ownership of index_name is required.
FUNCTION	CREATE [TEMPORARY] FUNCTION function_name AS 'class_name'	The admin permission is required.
FUNCTION	DROP [TEMPORARY] function_name	The admin permission is required.
MACRO	CREATE TEMPORARY MACRO macro_name ...	The admin permission is required.
MACRO	DROP TEMPORARY MACRO macro_name	The admin permission is required.

You can perform all the previous operations when owning the admin permission of Hive and the corresponding directory permission of HDFS.
If the current component uses Ranger for permission control, you need to configure permission management policies based on Ranger. For details, see Adding a Ranger Access Permission Policy for Hive.

Parent topic: Hive Development Guide (Security Mode)

Previous topic: Hive Development Guide (Security Mode)

Next topic: Hive Application Development Process