Updated on 2024-10-25 GMT+08:00

Storing Hive Table Partitions to OBS and HDFS

Scenario

In the scenario where storage and compute resources are separated, you can specify different storage sources, for example, OBS or HDFS, for partitions in a Hive partitioned table.

This feature applies only to MRS 3.2.0 or later. This section describes the capability of specifying storage sources for partitioned tables. For details about how to connect Hive to OBS in the storage-compute decoupling scenario, see Interconnecting Hive with OBS.

Prerequisites

The Hive client has been installed.

Example

  1. Log in to the node where the Hive client is installed as the Hive client installation users.
  2. Run the following command to go to the client installation directory:

    cd Client installation directory

    For example, if the client installation directory is /opt/client, run the following command:

    cd /opt/client

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. Check whether the cluster authentication mode is in security mode.
    • If yes, run the following command to authenticate the user:

      kinit Hive service user

    • If no, go to 5.
  5. Run the following command to log in to the Hive client:

    beeline

  6. Run the following commands to create a Hive partitioned table named table_1, and set the path of partitions pt=?2021-12-12 and pt='2021-12-18 to hdfs//xxx and obs://xxx respectively:

    create table table_1(id string) partitioned by(pt string) [stored as [orc|textfile|parquet|...]];

    alter table table_1 add partition(pt='2021-12-12') location 'hdfs://xxx';

    alter table table_1 add partition(pt='2021-12-18') location 'obs://xxx';

  7. After data is inserted into table_1, it is stored in the corresponding storage source. You can run the desc command to view the location of each partition.

    desc formatted table_1 partition(pt='2021-12-18');