Updated on 2023-08-31 GMT+08:00

Loading Data

Function

This section describes how to use HQL to load data to the existing employees_info table. You can learn how to load data from a local file system and MRS cluster. LOCAL is used to differentiate between local and non-local data sources.

To perform the following operations on a cluster enabled with the security service, you must have the UPDATE permission for databases, and owner permission and read/write permission for files to be loaded. For details about permission requirements, see Overview.

If LOCAL is used in data loading statements, data is loaded from a local directory. In addition to the UPDATE permission for tables, you must have the read permission for the data path. It is also required that the data can be accessed by user omm on the active HiveServer.

If OVERWRITE is used in data loading statements, the existing data in a table will be overwritten by new data. If OVERWRITE does not exist, data will be added to the table.

Example Codes

-- Load the employee_info.txt file from the /opt/hive_examples_data/ directory of the local file system to the employees_info table.
---- Overwrite the original data using the new data.
LOAD DATA LOCAL INPATH '/opt/hive_examples_data/employee_info.txt' OVERWRITE INTO TABLE employees_info; 
---- Retain the original data and add the new data to the table.
LOAD DATA LOCAL INPATH '/opt/hive_examples_data/employee_info.txt' INTO TABLE employees_info;
 
-- Load /user/hive_examples_data/employee_info.txt from HDFS to the employees_info table.
---- Overwrite the original data using the new data.
LOAD DATA INPATH '/user/hive_examples_data/employee_info.txt' OVERWRITE INTO TABLE employees_info; 
---- Retain the original data and add the new data to the table.
LOAD DATA INPATH '/user/hive_examples_data/employee_info.txt' INTO TABLE employees_info;   

The essence of loading data is to copy the data to the specified table directory in HDFS.

Sample Data

Data in employees_info is as follows:

1,Wang,R,8000.01,person&personal^Btype&income^Btax&0.05,Country1:City1,2014
3,Tom,D,12000.02,person&personal^Btype&income^Btax&0.09,Country2:City2,2014
4,Jack,D,24000.03,person&personal^Btype&income^Btax&0.05,Country3:City3,2014
6,Linda,D,36000.04,person&personal^Btype&income^Btax&0.05,Country4:City4,2014
8,Zhang,R,9000.05,person&personal^Btype&income^Btax&0.05,Country5:City5,2014

Data in employees_contact is as follows:

1,135 XXXX XXXX,xxxx@xx.com
3,159 XXXX XXXX,xxxxx@xx.com.cn
4,186 XXXX XXXX,xxxx@xx.org
6,189 XXXX XXXX,xxxx@xxx.cn
8,134 XXXX XXXX,xxxx@xxxx.cn

Data in employees_info_extended is as follows:

1,Wang,135 XXXX XXXX,xxxx@xx.com,R,8000.01,person&personal^Btype&income^Btax&0.05,Country1:City1,2014
3,Tom,159 XXXX XXXX,xxxxx@xx.com.cn,D,12000.02,person&personal^Btype&income^Btax&0.09,Country2:City2,2014
4,Jack,186 XXXX XXXX,xxxx@xx.org,D,24000.03,person&personal^Btype&income^Btax&0.05,Country3:City3,2014
6,Linda,189 XXXX XXXX,xxxx@xxx.cn,D,36000.04,person&personal^Btype&income^Btax&0.05,Country4:City4,2014
8,Zhang,134 XXXX XXXX,xxxx@xxxx.cn,R,9000.05,person&personal^Btype&income^Btax&0.05,Country5:City5,2014