Updated on 2025-04-14 GMT+08:00

Hive Application Development Overview

Hive Introduction

Hive is an open-source data warehouse built on Hadoop. It stores structured data and provides basic data analysis services using the Hive query language (HQL), a language like the SQL. Hive converts HQL statements to Mapreduce or Spark jobs for querying and analyzing massive data stored in Hadoop clusters.

Hive provides the following features:

  • Extracts, transforms, and loads (ETL) data using HQL.
  • Analyzes massive structured data using HQL.
  • Supports flexible data storage formats, including JavaScript object notation (JSON), comma separated values (CSV), TextFile, RCFile, ORCFILE, and SequenceFile, and supports custom extensions.
  • Multiple client connection modes. Interfaces, such as JDBC and Thrift interfaces are supported.

Hive applies to offline massive data analysis (such as log and cluster status analysis), large-scale data mining (such as user behavior analysis, interest region analysis, and region display), and other scenarios.

To ensure Hive high availability (HA), user data security, and service access security, MRS incorporates the following features based on Hive 3.1.0:

  • Kerberos security authentication
  • Data file encryption
  • Complete rights management

For Hive features in the Open Source Community, seehttps://cwiki.apache.org/confluence/display/hive/designdocs.

Common Concepts

  • keytab file

    The keytab file is a key file that stores user information. Applications use the key file for API authentication on MRS.

  • Client

    Users can access the server from the client through the Java API and Thrift API to perform Hive-related operations.

  • HQL

    Similar to SQL

  • HCatalog

    HCatalog is a table information management layer created based on Hive metadata and absorbs DDL commands of Hive. HCatalog provides read/write interfaces for Mapreduce and provides Hive command line interfaces (CLIs) for defining data and querying metadata. Hive and Mapreduce development personnel can share metadata based on the HCatalog component of MRS, preventing intermediate conversion and adjustment and improving the data processing efficiency.

  • WebHCat

    WebHCat running users use Rest APIs to perform operations, such as running Hive DDL commands, submitting Mapreduce tasks, and querying Mapreduce task execution results.

Required Permissions

Table 1 Required permissions

Operation Type/Functional Object

Operation

Required Permission

DATABASE

CREATE DATABASE dbname [LOCATION "hdfs_path"]

If the HDFS path hdfs_pathis specified, the ownership and RWX permission ofhdfs_path are required.

DROP DATABASE dbname

The database dbname ownership is required.

ALTER DATABASE dbname SET OWNERuser_or_role

The admin permission is required.

TABLE

CREATE TABLE table_a

The create permission for the database is required.

CREATE TABLE table_a AS SELECTtable_b

The create permission for the database and the select permission for table_b are required.

CREATE TABLE table_a LIKEtable_b

The create permission for the database is required.

CREATE [EXTERNAL] TABLE table_a LOCATION "hdfs_path"

The create permission for the database, and the ownership and RWX permission of hdfs_path on HDFS are required.

DROP TABLE table_a

The ownership of table_a is required.

ALTER TABLE table_a SET LOCATION "hdfs_path"

The ownership of table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.

ALTER TABLE table_a SETFILEFORMAT

The ownership of table_a is required.

TRUNCATE TABLE table_a

The ownership of table_a is required.

ANALYZE TABLE table_a COMPUTE STATISTICS

The select and insert permission for table_a is required.

SHOW TBLPROPERTIES table_a

The select permission for table_a is required.

SHOW CREATE TABLE table_a

The select permission with grant option for table_a is required.

Alter

ALTER TABLE table_a ADD COLUMN

The ownership of table_a is required.

ALTER TABLE table_a REPLACE COLUMN

The ownership of table_a is required.

ALTER TABLE table_a RENAME

The ownership of table_a is required.

ALTER TABLE table_a SET SERDE

The ownership of table_a is required.

ALTER TABLE table_a CLUSTER BY

The ownership of table_a is required.

PARTITION

ALTER TABLE table_a ADD PARTITIONpartition_spec LOCATION "hdfs_path"

The insert permission for table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.

ALTER TABLE table_a DROP PARTITIONpartition_spec

The delete permission for table_a is required.

ALTER TABLE table_a PARTITIONpartition_spec SET LOCATION "hdfs_path"

The ownership of table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.

ALTER TABLE table_a PARTITIONpartition_spec SET FILEFORMAT

The ownership of table_a is required.

LOAD

LOAD INPATH 'hdfs_path' INTO TABLE table_a

The insert permission for table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required.

INSERT

INSERT TABLE table_a SELECT FROMtable_b

The update permission for table_aand select permission fortable_b are required.

SELECT

SELECT * FROM table_a

The select permission for table_a is required.

SELECT FROM table_aJOINtable_b

The select permission for table_aandtable_b, the Submit permission of the default Yarn queue is required.

SELECT FROM (SELECT FROM table_aUNION ALL SELECT FROMtable_b)

The select permission for table_aandtable_b, the Submit permission of the default Yarn queue is required.

EXPLAIN

EXPLAIN [EXTENDED|DEPENDENCY] query

The RX permissions for related table directory is required.

VIEW

CREATE VIEW view_name AS SELECT ...

The select permission with grant option for related tables is required.

ALTER VIEW view_name RENAME TOnew_view_name

The ownership of view_name is required.

DROP VIEW view_name

The ownership of view_name is required.

INDEX

CREATE INDEX index_name ON TABLEbase_table_name (col_name, ...) ASindex_type

The ownership of table_a is required.

DROP INDEX index_name ONtable_name

The ownership of index_name is required.

ALTER INDEX index_name ONtable_name REBUILD

The ownership of index_name is required.

FUNCTION

CREATE [TEMPORARY] FUNCTION function_name AS 'class_name'

The admin permission is required.

DROP [TEMPORARY] function_name

The admin permission is required.

MACRO

CREATE TEMPORARY MACRO macro_name ...

The admin permission is required.

DROP TEMPORARY MACRO macro_name

The admin permission is required.

  • You can perform all the previous operations when owning the admin permission of Hive and the corresponding directory permission of HDFS.
  • If the current component uses Ranger for permission control, you need to configure permission management policies based on Ranger. For details, see Adding a Ranger Access Permission Policy for Hive.