Help Center/ MapReduce Service/ Getting Started/ Configuring Hive with Storage and Compute Decoupled
Updated on 2022-09-14 GMT+08:00

Configuring Hive with Storage and Compute Decoupled

MRS allows you to store data in OBS and use an MRS cluster for data computing only. In this way, storage and compute are decoupled. You can use the IAM service to perform simple configurations to access OBS.

This section describes how to create a Hive table to store data to OBS.

  1. Creating an ECS Agency
  2. Configuring an Agency for an MRS Cluster
  3. Creating an OBS File System
  4. Accessing the OBS File System Through Hive

Creating an ECS Agency

  1. Log in to the Huawei Cloud management console.
  2. Choose Service List > Management & Governance > Identity and Access Management.
  3. Click Agencies. On the displayed page, click Create Agency.
  4. Enter an agency name, for example, mrs_ecs_obs.
  5. Set Agency Type to Cloud service and select ECS BMS to authorize ECS or BMS to invoke OBS.
  6. Set Validity Period to Unlimited and click Next.
    Figure 1 Creating an agency
  7. On the page that is displayed, select Global service project, search for the OBS OperateAccess policy, and select the OBS OperateAccess policy.
    Figure 2 Assigning permissions
  8. Click OK.

Configuring an Agency for an MRS Cluster

You can configure an agency when creating a cluster or bind an agency to an existing cluster to decouple storage and compute. This section uses an existing cluster as an example to describe how to configure an agency.

  1. Log in to the MRS console. In the navigation pane on the left, choose Clusters > Active Clusters.
  2. Click the name of a cluster to go to the cluster details page.
  3. On the Dashboard page, click Synchronize on the right side of IAM User Sync to synchronize IAM users.
  4. On the Dashboard page, click Manage Agency on the right side of Agency to select the agency created in Creating an ECS Agency, and click OK to bind it to the cluster. Alternatively, click Create Agency to go to the IAM console to create an agency and bind it to the cluster.
    Figure 3 Binding an agency

Creating an OBS File System

  1. Log in to the OBS console.
  2. Choose Parallel File System > Create Parallel File System.
  3. Enter the file system name, for example, mrs-demo01.

    Set other parameters as required.

  4. Click Create Now.
  5. In the parallel file system list on the OBS console, click a file system name to go to the details page.
  6. In the navigation pane, choose Files and create program and input folders.
    • program: Upload the program package to this folder.
    • input: Upload the input data to this folder.

Accessing the OBS File System Through Hive

  1. Log in to a master node as user root. For details, see Logging In to an ECS.
  2. Verify that Hive can access OBS.
    1. Log in to the master node of the cluster as user root and run the following commands:

      cd /opt/Bigdata/client

      source bigdata_env

      source Hive/component_env

    2. View the list of files in file system mrs-demo01.

      hadoop fs -ls obs://mrs-demo01/

    3. Check whether the file list is returned. If it is returned, access to OBS is successful.

    4. Run the following command to authenticate the user (skip this step for a normal cluster, that is, with Kerberos authentication disabled):

      kinit hive

      Enter the password of user hive. The default password is Hive@123. Change the password upon the first login.

    5. Run the Hive client command.

      beeline

    6. Access the OBS directory in the Beeline. For example, run the following command to create a Hive table and specify that data is stored in the test_demo01 table of file system mrs-demo01:

      create table test_demo01(name string) location "obs://mrs-demo01/test_demo01";

    7. Run the following command to query all tables. If table test_demo01 is displayed in the command output, access to OBS is successful.

      show tables;

    8. Run the following command to check the table location.

      show create table test_demo01;

      Check whether the location of the table starts with obs://OBS bucket name/.

    9. Run the following command to write data into the table.

      insert into test_demo01 values('mm'),('ww'),('ww');

      Run the select * from test_demo01; command to check whether the data is written successfully.

    10. Run the !q command to exit the Beeline client.
    11. Log in to the OBS console again.
    12. Click Parallel File System and select the created file system.
    13. Click Files to check whether the data exists in the created table.