Hive Application Development Overview
Hive Introduction
Hive is an open-source data warehouse built on Hadoop. It stores structured data and provides basic data analysis services using the Hive query language (HQL), a language like the SQL. Hive converts HQL statements to Mapreduce or Spark jobs for querying and analyzing massive data stored in Hadoop clusters.
Hive provides the following features:
- Extracts, transforms, and loads (ETL) data using HQL.
- Analyzes massive structured data using HQL.
- Supports flexible data storage formats, including JavaScript object notation (JSON), comma separated values (CSV), TextFile, RCFile, ORCFILE, and SequenceFile, and supports custom extensions.
- Multiple client connection modes. Interfaces, such as JDBC and Thrift interfaces are supported.
Hive applies to offline massive data analysis (such as log and cluster status analysis), large-scale data mining (such as user behavior analysis, interest region analysis, and region display), and other scenarios.
To ensure Hive high availability (HA), user data security, and service access security, MRS incorporates the following features based on Hive 3.1.0:
- Kerberos security authentication
- Data file encryption
- Complete rights management
For Hive features in the Open Source Community, seehttps://cwiki.apache.org/confluence/display/hive/designdocs.
Common Concepts
- keytab file
The keytab file is a key file that stores user information. Applications use the key file for API authentication on MRS.
- Client
Users can access the server from the client through the Java API and Thrift API to perform Hive-related operations.
- HQL
Similar to SQL
- HCatalog
HCatalog is a table information management layer created based on Hive metadata and absorbs DDL commands of Hive. HCatalog provides read/write interfaces for Mapreduce and provides Hive command line interfaces (CLIs) for defining data and querying metadata. Hive and Mapreduce development personnel can share metadata based on the HCatalog component of MRS, preventing intermediate conversion and adjustment and improving the data processing efficiency.
- WebHCat
WebHCat running users use Rest APIs to perform operations, such as running Hive DDL commands, submitting Mapreduce tasks, and querying Mapreduce task execution results.
Required Permissions
Operation Type/Functional Object |
Operation |
Required Permission |
---|---|---|
DATABASE |
CREATE DATABASE dbname [LOCATION "hdfs_path"] |
If the HDFS path hdfs_pathis specified, the ownership and RWX permission ofhdfs_path are required. |
DROP DATABASE dbname |
The database dbname ownership is required. |
|
ALTER DATABASE dbname SET OWNERuser_or_role |
The admin permission is required. |
|
TABLE |
CREATE TABLE table_a |
The create permission for the database is required. |
CREATE TABLE table_a AS SELECTtable_b |
The create permission for the database and the select permission for table_b are required. |
|
CREATE TABLE table_a LIKEtable_b |
The create permission for the database is required. |
|
CREATE [EXTERNAL] TABLE table_a LOCATION "hdfs_path" |
The create permission for the database, and the ownership and RWX permission of hdfs_path on HDFS are required. |
|
DROP TABLE table_a |
The ownership of table_a is required. |
|
ALTER TABLE table_a SET LOCATION "hdfs_path" |
The ownership of table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required. |
|
ALTER TABLE table_a SETFILEFORMAT |
The ownership of table_a is required. |
|
TRUNCATE TABLE table_a |
The ownership of table_a is required. |
|
ANALYZE TABLE table_a COMPUTE STATISTICS |
The select and insert permission for table_a is required. |
|
SHOW TBLPROPERTIES table_a |
The select permission for table_a is required. |
|
SHOW CREATE TABLE table_a |
The select permission with grant option for table_a is required. |
|
Alter |
ALTER TABLE table_a ADD COLUMN |
The ownership of table_a is required. |
ALTER TABLE table_a REPLACE COLUMN |
The ownership of table_a is required. |
|
ALTER TABLE table_a RENAME |
The ownership of table_a is required. |
|
ALTER TABLE table_a SET SERDE |
The ownership of table_a is required. |
|
ALTER TABLE table_a CLUSTER BY |
The ownership of table_a is required. |
|
PARTITION |
ALTER TABLE table_a ADD PARTITIONpartition_spec LOCATION "hdfs_path" |
The insert permission for table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required. |
ALTER TABLE table_a DROP PARTITIONpartition_spec |
The delete permission for table_a is required. |
|
ALTER TABLE table_a PARTITIONpartition_spec SET LOCATION "hdfs_path" |
The ownership of table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required. |
|
ALTER TABLE table_a PARTITIONpartition_spec SET FILEFORMAT |
The ownership of table_a is required. |
|
LOAD |
LOAD INPATH 'hdfs_path' INTO TABLE table_a |
The insert permission for table_a, and the ownership and RWX permission ofhdfs_path on HDFS are required. |
INSERT |
INSERT TABLE table_a SELECT FROMtable_b |
The update permission for table_aand select permission fortable_b are required. |
SELECT |
SELECT * FROM table_a |
The select permission for table_a is required. |
SELECT FROM table_aJOINtable_b |
The select permission for table_aandtable_b, the Submit permission of the default Yarn queue is required. |
|
SELECT FROM (SELECT FROM table_aUNION ALL SELECT FROMtable_b) |
The select permission for table_aandtable_b, the Submit permission of the default Yarn queue is required. |
|
EXPLAIN |
EXPLAIN [EXTENDED|DEPENDENCY] query |
The RX permissions for related table directory is required. |
VIEW |
CREATE VIEW view_name AS SELECT ... |
The select permission with grant option for related tables is required. |
ALTER VIEW view_name RENAME TOnew_view_name |
The ownership of view_name is required. |
|
DROP VIEW view_name |
The ownership of view_name is required. |
|
INDEX |
CREATE INDEX index_name ON TABLEbase_table_name (col_name, ...) ASindex_type |
The ownership of table_a is required. |
DROP INDEX index_name ONtable_name |
The ownership of index_name is required. |
|
ALTER INDEX index_name ONtable_name REBUILD |
The ownership of index_name is required. |
|
FUNCTION |
CREATE [TEMPORARY] FUNCTION function_name AS 'class_name' |
The admin permission is required. |
DROP [TEMPORARY] function_name |
The admin permission is required. |
|
MACRO |
CREATE TEMPORARY MACRO macro_name ... |
The admin permission is required. |
DROP TEMPORARY MACRO macro_name |
The admin permission is required. |

- You can perform all the previous operations when owning the admin permission of Hive and the corresponding directory permission of HDFS.
- If the current component uses Ranger for permission control, you need to configure permission management policies based on Ranger. For details, see Adding a Ranger Access Permission Policy for Hive.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot