Connecting Hive to OBS
Overview
Hive is a data warehouse tool that can extract, transform, and load large-scale data sets stored distributedly. It provides various SQL query methods for data analysis.
Prerequisites
Hadoop has been installed. For details, see Connecting Hadoop to OBS.
Procedure
The following uses Hive 2.3.3 as an example.
- Download apache-hive-2.3.3-bin.tar.gz and decompress it to the /opt/hive-2.3.3 directory.
- Add the following content to the /etc/profile file:
export HIVE_HOME=/opt/hive-2.3.3 export PATH=$HIVE_HOME/bin:$PATH
- Configure Hive.
- Rename hive-env.sh.template under /opt/hive-2.3.3/conf/ as hive-env.sh.
- Rename hive-log4j2.properties.template under opt/hive-2.3.3/conf/ as hive-log4j2.properties.
- Create the hive-site.xml file and add the following configurations:
1 2 3 4
<property> <name>hive.metastore.warehouse.dir</name> <value>obs://obs-bucket/warehouse/hive</value> </property>
Adding these configurations is optional. After they are added, you do not need to explicitly specify the location when you create a Hive table, and the created Hive table will be automatically stored in OBS.
- Initialize the metadata:
- Check whether the connection is successful.
In the following example, the location is obs://obs-bucket/warehouse/hive/student.
1 2 3 4 5 6
hive> create table student(id int comment "Student ID",name string comment "Student name",age int comment "Student age") comment "Student information table" row format delimited fields terminated by ","; insert into table student select 6,"yangdong",29;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.