Updated on 2023-12-04 GMT+08:00

Interconnecting Hive with OBS

Before performing the following operations, ensure that you have configured a storage-compute decoupled cluster by referring to Configuring a Storage-Compute Decoupled Cluster (Agency) or Configuring a Storage-Compute Decoupled Cluster (AK/SK).

Setting the Location to an OBS Path When Creating a Table

  1. Log in to the client installation node as the client installation user.
  2. Run the following command to initialize environment variables:

    source Client installation directory/bigdata_env

  3. For a security cluster, run the following command to perform user authentication (the user must have the permission to perform Hive operations). If Kerberos authentication is not enabled for the current cluster, you do not need to run this command.

    kinit User performing Hive operations

  4. Log in to FusionInsight Manager of a cluster earlier than MRS 3.2.0, choose Cluster > Services > Hive, and click Configurations > All Configurations.

    In the navigation pane on the left, choose Hive > Customization. In custom configuration items, add dfs.namenode.acls.enabled to hdfs.site.customized.configs and set its value to false.

    Figure 1 Adding custom parameters

  5. Click Save to save the configuration for versions earlier than MRS 3.2.0. On the Dashboard page, click More and select Restart Service. Enter the password of the current user, click OK, and select Restart upper-layer services. Click OK to restart Hive.
  6. Log in to the beeline client and set Location to the OBS file system path when creating a table.

    beeline

    For example, run the following command to create the table test in obs://OBS parallel file system name/user/hive/warehouse/Database name/Table name:

    create table test(name string) location "obs://OBS parallel file system name/user/hive/warehouse/Database name/Table name";

    You need to add the component operator to the URL policy in the Ranger policy. Set the URL to the complete path of the object on OBS. Select the Read and Write permissions.

    For versions earlier than MRS 3.x, see Configuring Hive Access Permissions in Ranger. For MRS 3.x or later, see Adding a Ranger Access Permission Policy for Hive.

Interconnecting Hive with OBS Through MetaStore

  1. Log in to FusionInsight Manager and choose Cluster > Services > Hive > Configurations > All Configurations.

    • For versions earlier than MRS 3.2.0:
      • In the navigation pane on the left, choose MetaStore (role) > Customization. Add the configuration item hive.metastore.warehouse.dir to the custom parameter hive.metastore.customized.configs and set the value to an OBS path. For example, set it to obs://hivetest/user/hive/warehouse/, where hivetest is the name of the OBS parallel file system.
        Figure 2 Configuring hive.metastore.warehouse.dir
      • In the navigation pane on the left, choose HiveServer (role) > Customization. Add the configuration item hive.metastore.warehouse.dir to hive.metastore.customized.configs and hive.server.customized.configs and set the value to an OBS path. For example, set it to obs://hivetest/user/hive/warehouse/, where hivetest is the name of the OBS parallel file system.
        Figure 3 Configuring hive.metastore.warehouse.dir
    • For MRS 3.2.0 and later versions:
      Search for hive.metastore.warehouse.dir in the search box and change the parameter value to an OBS path, for example, obs://hivetest/user/hive/warehouse/. hivetest indicates the OBS file system name.
      Figure 4 Configuring hive.metastore.warehouse.dir

  2. Save the change and restart Hive.
  3. (Optional) Install the client by referring to Installing a Client. If the client has been installed in the cluster, go to 4.
  4. Update the client configuration file.

    1. Run the following command to open hivemetastore-site.xml in the Hive configuration file directory on the client:

      vim Client installation directory/Hive/config/hivemetastore-site.xml

    2. Change the value of hive.metastore.warehouse.dir to the corresponding OBS path, for example, obs://hivetest/user/hive/warehouse/, where hivetest is the OBS bucket name.
      Figure 5 Configuring the OBS Path
    3. For MRS 3.2.0 and later versions, change the value of hive.metastore.warehouse.dir in hivemetastore-site.xml to the corresponding OBS path, for example, obs://hivetest/user/hive/warehouse/. The XML file is stored in the HCatalog client configuration file directory.

      vi Client installation directory/Hive/HCatalog/conf/hivemetastore-site.xml

  5. Log in to the beeline client, create a table, and check whether the location is the OBS path.

    beeline

    create table test(name string);

    desc formatted test;

    Location of the table is the OBS path.

    Figure 6 Location of the Hive table

    If the location of the current database points to HDFS, tables created in the database also point to HDFS by default. You do not need to specify the location. To modify the default table creation policy, modify the location of the database to point to OBS. Perform the following steps to modify the parameters:

    1. Run the following command to query the location of the database:

      show create database obs_test;

      Figure 7 Viewing the location of the Hive Table
    2. Run the following command to change the database location:

      alter database obs_test set location 'obs://OBS parallel file system name/user/hive/warehouse/Database name'

      Run the show create database obs_test command to check whether the database location points to OBS.

      Figure 8 Check the location of the modified Hive table.
    3. Run the following command to modify the table location:

      alter table user_info set location 'obs://OBS parallel file system name/user/hive/warehouse/Database name/Table name'

      If the table contains data, migrate the original data file to the new location.