Help Center> GaussDB(DWS)> Best Practices> Import and Export> Tutorial: Viewing or Importing Data from MRS Hive
Updated on 2024-06-07 GMT+08:00

Tutorial: Viewing or Importing Data from MRS Hive

In this tutorial, an HDFS foreign table is created to enable GaussDB(DWS) to remotely access or read MRS data sources.

Preparing the Environment

Create a GaussDB(DWS) cluster. Ensure that the MRS and GaussDB(DWS) clusters are in the same region, AZ, and VPC subnet and that the clusters can communicate with each other.

Procedure

This practice takes about 1 hour. The basic process is as follows:

  1. Create an MRS cluster deployed with Hive, Spark, and Tez.
  2. Upload the local TXT data files to an OBS bucket and import the files to a Hive storage table, and then to an ORC storage table.
  3. Create an MRS data source connection.
  4. Create a foreign server.
  5. Create a foreign table.
  6. Import data to a local GaussDB(DWS) table from the foreign table.

Creating an MRS Cluster

  1. Log in to the HUAWEI CLOUD console, choose Analytics > MapReduce Service and click Buy Cluster. Click the Custom Config tab, configure software parameters, and click Next.

    Table 1 Software configuration

    Parameter

    Value

    Region

    CN-Hong Kong

    Cluster Name

    mrs_01

    Version

    Normal

    Cluster Version

    MRS 1.9.2 (recommended)

    NOTE:
    • For clusters of version 8.1.1.300 and later, MRS clusters support versions 1.6.*, 1.7.*, 1.8.*, 1.9.*, 2.0.*, 3.0.*, 3.1.*, and later (* indicates a number).
    • For clusters earlier than version 8.1.1.300, MRS clusters support versions 1.6.*, 1.7.*, 1.8.*, 1.9.*, and 2.0.* (* indicates a number).

    Cluster Type

    Analysis Cluster

    Metadata

    Local

  2. Configure hardware parameters and click Next.

    Table 2 Hardware configuration

    Parameter

    Value

    Billing Mode

    Pay-per-use

    AZ

    AZ2

    VPC

    vpc-01

    Subnet

    subnet-01

    Security Group

    Auto create

    EIP

    10.x.x.x

    Enterprise Project

    Default

    Master

    2

    Analysis Core

    3

    Analysis Task

    0

  3. When you have completed the advanced settings based on the following table, click Buy Now and wait for about 15 minutes. The cluster is successfully created.

    Table 3 Advanced configuration

    Parameter

    Value

    Tag

    test01

    Hostname Prefix

    (Optional) Prefix for the name of an ECS or BMS in the cluster.

    Auto Scaling

    Retain the default value.

    Bootstrap Action

    Retain the default value. MRS 3.x does not support this parameter.

    Agency

    Retain the default value.

    Data Disk Encryption

    This function is disabled by default. Retain the default value.

    Alarm

    Retain the default value.

    Rule Name

    Retain the default value.

    Topic Name

    Select a topic.

    Kerberos Authentication

    This parameter is enabled by default.

    Username

    admin

    Password

    This password is used to log in to the cluster management page.

    Confirm Password

    Enter the password of user admin again.

    Login Mode

    Password

    Username

    root

    Password

    This password is used to remotely log in to the ECS.

    Confirm Password

    Enter the password of user root again.

    Secure Communications

    Select Enable.

Preparing the ORC Table Data Source of MRS

  1. Create a product_info.txt file on the local PC, copy the following data to the file, and save the file to the local PC.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good
    205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good!
    300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad.
    310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice
    150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite
    200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality.
    250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time.
    108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy
    450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor
    260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes
    980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small
    98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter.
    150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective
    200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear
    300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good
    100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good.
    350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good
    110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 
    210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good.
    230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good
    

  2. Log in to the OBS console, click Create Bucket, configure the following parameters, and click Create Now.

    Table 4 Bucket parameters

    Parameter

    Value

    Region

    CN-Hong Kong

    Data Redundancy Policy

    Single-AZ Storage

    Bucket Name

    mrs-datasource

    Default Storage Class

    Standard

    Bucket Policy

    Private

    Default Encryption

    Disable

    Direct Reading

    Disable

    Enterprise Project

    Default

    Tags

    N/A

  3. After the bucket is created, click the bucket name and choose Object > Upload Object to upload the product_info.txt file to the OBS bucket.
  4. Switch back to the MRS console and click the name of the created MRS cluster. On the Dashboard page, click the Synchronize button next to IAM User Sync. The synchronization takes about 5 minutes.
  5. Click Nodes and click a master node. On the displayed page, switch to the EIPs tab, click Bind EIP, select an existing EIP, and click OK. If no EIP is available, create one. Record the EIP.
  6. Download the client.

    1. Go back to the MRS cluster page. Click the cluster name. On the Dashboard tab page of the cluster details page, click Access Manager. If a message is displayed indicating that EIP needs to be bound, bind an EIP first.
    2. In the Access MRS Manager dialog box, click OK. You will be redirected to the MRS Manager login page. Enter the username admin and its password for logging in to MRS Manager. The password is the one you entered when creating the MRS cluster.
    3. Choose Cluster > Name of the desired cluster > Dashboard > More > Download Client. The Download Cluster Client dialog box is displayed.

      To obtain the client of an earlier version, choose Services > Download Client and set Select Client Type to Configuration Files Only.

  7. Determine the active master node.

    1. Use SSH to log in to the preceding node as user root. Run the following command to switch to user omm:

      su - omm

    2. Run the following command to query the active master node. In the command output, the node whose value of HAActive is active is the active master node.

      sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh

  8. Log in to the active master node as user root and update the client configuration of the active management node.

    cd /opt/client

    sh refreshConfig.sh /opt/client Full_path_of_client_configuration_file_package

    In this tutorial, run the following command:

    sh refreshConfig.sh /opt/client /tmp/MRS-client/MRS_Services_Client.tar

  9. Switch to user omm and go to the directory where the Hive client is located.

    su - omm

    cd /opt/client

  10. Create the product_info table whose storage format is TEXTFILE on Hive.

    1. Import environment variables to the /opt/client directory.

      source bigdata_env

    2. Log in to the Hive client.

      beeline

    3. Run the following SQL commands in sequence to create a demo database and the product_info table:
      1
      CREATE DATABASE demo;
      
      1
      USE demo;
      
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      DROP TABLE product_info;
      
      CREATE TABLE product_info 
      (    
          product_price                int            ,
          product_id                   char(30)       ,
          product_time                 date           ,
          product_level                char(10)       ,
          product_name                 varchar(200)   ,
          product_type1                varchar(20)    ,
          product_type2                char(10)       ,
          product_monthly_sales_cnt    int            ,
          product_comment_time         date           ,
          product_comment_num          int        ,
          product_comment_content      varchar(200)                   
      ) 
      row format delimited fields terminated by ',' 
      stored as TEXTFILE;
      

  11. Import the product_info.txt file to Hive.

    1. Switch back to the MRS cluster, click Files > Import Data.
    2. OBS Path: Find the product_info.txt file in the created OBS bucket and click Yes.
    3. HDFS Path: Select /user/hive/warehouse/demo.db/product_info/ and click Yes.
    4. Click OK to import the product_info table data.

  12. Create an ORC table and import data to the table.

    1. Run the following SQL commands to create an ORC table:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      DROP TABLE product_info_orc;
      
      CREATE TABLE product_info_orc
      (    
          product_price                int            ,
          product_id                   char(30)       ,
          product_time                 date           ,
          product_level                char(10)       ,
          product_name                 varchar(200)   ,
          product_type1                varchar(20)    ,
          product_type2                char(10)       ,
          product_monthly_sales_cnt    int            ,
          product_comment_time         date           ,
          product_comment_num          int            ,
          product_comment_content      varchar(200)                   
      ) 
      row format delimited fields terminated by ',' 
      stored as orc;
      
    2. Insert data in the product_info table into the Hive ORC table product_info_orc.
      1
      INSERT INTO product_info_orc select * from product_info;
      
    3. Query whether the data import is successful.
      1
      SELECT * FROM product_info_orc;
      

Creating an MRS Data Source Connection

  1. Log in to the GaussDB(DWS) console and click the created data warehouse cluster. Ensure that the GaussDB(DWS) and MRS clusters are in the same region, AZ, and VPC subnet.
  2. Click the MRS Data Source tab and click Create MRS Cluster Connection.
  3. Select data source mrs_01 created in the previous step, enter the MRS account name admin and its password, and click OK.

Creating a Foreign Server

  1. Use Data Studio to connect to the created GaussDB(DWS) cluster.
  2. Create a user dbuser that has the permission for creating databases.

    1
    CREATE USER dbuser WITH CREATEDB PASSWORD 'password';
    

  1. Switch to user dbuser.

    1
    SET ROLE dbuser PASSWORD 'password';
    

  1. Create a database mydatabase.

    1
    CREATE DATABASE mydatabase;
    

  1. Perform the following steps to switch to database mydatabase:

    1. In the Object Browser window of the Data Studio client, right-click the database connection and select Refresh from the shortcut menu. The new database is displayed.
    2. Right-click the database name mydatabase and select Connect to DB from the shortcut menu.
    3. Right-click the database name mydatabase and select Open Terminal from the shortcut menu. The SQL command window for connecting to a database is displayed. Perform the following steps in the window.

  2. Grant the permission to create external servers to user dbuser. In 8.1.1 and later versions, you also need to grant the permission to use the public mode.

    1
    2
    GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser;
    In GRANT ALL ON SCHEMA public TO dbuser; //8.1.1 and later versions, common users do not have permission on the public mode and need to grant permission. In versions earlier than 8.1.1, you do not need to perform this operation.
    

    The name of FOREIGN DATA WRAPPER must be hdfs_fdw. dbuser indicates the username of CREATE SERVER.

  3. Grant user dbuser the permission for using foreign tables.

    1
    ALTER USER dbuser USEFT;
    

  4. Switch to the Postgres database and query the foreign server automatically created by the system after the MRS data source is created.

    1
    SELECT * FROM pg_foreign_server;
    

    Information similar to the following is displayed:

    1
    2
    3
    4
    5
    6
                         srvname                      | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                     srvoptions
    --------------------------------------------------+----------+--------+---------+------------+--------+---------------------------------------------------------------------------------------------------------------------
     gsmpp_server                                     |       10 |  13673 |         |            |        |
     gsmpp_errorinfo_server                           |       10 |  13678 |         |            |        |
     hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca |    16476 |  13685 |         |            |        | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs}
    (3 rows)
    

  5. Switch to database mydatabase and switch to user dbuser.

    1
    SET ROLE dbuser PASSWORD 'password';
    

  6. Create a foreign server.

    The server name, address, and configuration path must be the same as those in 8.

    1
    2
    3
    4
    5
    6
    7
    CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW 
    OPTIONS 
    (
    address '192.168.1.245:9820,192.168.1.218:9820',   //The intranet IP addresses of the active and standby master nodes on the MRS management plane, which can be used to communicate with GaussDB(DWS).
    hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca',
    type 'hdfs'
    );
    

  7. View the foreign server.

    1
    SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';
    

    The server is successfully created if information similar to the following is displayed:

    1
    2
    3
    4
                         srvname                      | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                     srvoptions
    --------------------------------------------------+----------+--------+---------+------------+--------+---------------------------------------------------------------------------------------------------------------------
     hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca |    16476 |  13685 |         |            |        | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs}
    (1 row)
    

Creating a Foreign Table

  1. Obtain the product_info_orc file path of Hive.

    1. Log in to the MRS console.
    2. Choose Cluster > Active Cluster and click the name of the cluster to be queried to enter the page displaying the cluster's basic information.
    3. Click the Files and click HDFS File List.
    4. Go to the storage directory of the data to be imported to the GaussDB(DWS) cluster and record the path.
      Figure 1 Checking the data storage path on MRS

  2. Create a foreign table. Set SERVER to the name of the external server created in 10 and foldername to the path obtained in 1.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    DROP FOREIGN TABLE IF EXISTS foreign_product_info;
    
    CREATE FOREIGN TABLE foreign_product_info
    (
        product_price                integer        ,
        product_id                   char(30)       ,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                      
    ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca 
    OPTIONS (
    format 'orc', 
    encoding 'utf8',
    foldername '/user/hive/warehouse/demo.db/product_info_orc/'
    ) 
    DISTRIBUTE BY ROUNDROBIN;
    

Importing Data

  1. Create a local table for data import.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    DROP TABLE IF EXISTS product_info;
    CREATE TABLE product_info
    (
        product_price                integer        ,
        product_id                   char(30)       ,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    with (
    orientation = column,
    compression=middle
    ) 
    DISTRIBUTE BY HASH (product_id);
    

  2. Import data to the target table from the foreign table.

    1
    INSERT INTO product_info SELECT * FROM foreign_product_info;
    

  3. Query the import result.

    1
    SELECT * FROM product_info;