Updated on 2024-03-25 GMT+08:00

Connecting Hive to OBS

Overview

Hive is a data warehouse tool that can extract, transform, and load large-scale data sets stored distributedly. It provides various SQL query methods for data analysis.

Prerequisites

Hadoop has been installed. For details, see Connecting Hadoop to OBS.

Procedure

The following uses Hive 2.3.3 as an example.

  1. Download apache-hive-2.3.3-bin.tar.gz and decompress it to the /opt/hive-2.3.3 directory.
  2. Add the following content to the /etc/profile file:

    export HIVE_HOME=/opt/hive-2.3.3
    export PATH=$HIVE_HOME/bin:$PATH

  3. Configure Hive.

    1. Rename hive-env.sh.template under /opt/hive-2.3.3/conf/ as hive-env.sh.
    2. Rename hive-log4j2.properties.template under opt/hive-2.3.3/conf/ as hive-log4j2.properties.
    3. Create the hive-site.xml file and add the following configurations:
      1
      2
      3
      4
      <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>obs://obs-bucket/warehouse/hive</value>
      </property>
      

      Adding these configurations is optional. After they are added, you do not need to explicitly specify the location when you create a Hive table, and the created Hive table will be automatically stored in OBS.

    4. Initialize the metadata:

      /opt/hive-2.3.3/bin/schematool -dbType derby -initSchema

  4. Check whether the connection is successful.

    In the following example, the location is obs://obs-bucket/warehouse/hive/student.
    1
    2
    3
    4
    5
    6
    hive>
    create table student(id int comment "Student ID",name string comment "Student name",age int comment "Student age")
    comment "Student information table"
    row format delimited fields terminated by ",";
    
    insert into table student select 6,"yangdong",29;