このページは、お客様の言語ではご利用いただけません。Huawei Cloudは、より多くの言語バージョンを追加するために懸命に取り組んでいます。ご協力ありがとうございました。

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ Best Practices/ Import and Export/ Importing Table Data from MRS Hive to a GaussDB(DWS) Cluster

Importing Table Data from MRS Hive to a GaussDB(DWS) Cluster

Updated on 2024-11-08 GMT+08:00

In this tutorial, an HDFS foreign table is created to enable GaussDB(DWS) to remotely access or read MRS data sources.

Preparing the Environment

Create a GaussDB(DWS) cluster. Ensure that the MRS and GaussDB(DWS) clusters are in the same region, AZ, and VPC subnet and that the clusters can communicate with each other.

Procedure

This practice takes about 1 hour. The basic process is as follows:

  1. Create an MRS cluster deployed with Hive, Spark, and Tez.
  2. Upload the local TXT data files to an OBS bucket and import the files to a Hive storage table, and then to an ORC storage table.
  3. Create an MRS data source connection.
  4. Create a foreign server.
  5. Create a foreign table.
  6. Import data to a local GaussDB(DWS) table from the foreign table.

Creating an MRS Cluster

  1. Log in to the HUAWEI CLOUD console, choose Analytics > MapReduce Service and click Buy Cluster. Click the Custom Config tab, configure software parameters, and click Next.

    Table 1 Software configuration

    Parameter

    Value

    Region

    CN-Hong Kong

    Cluster Name

    mrs_01

    Version

    Normal

    Cluster Version

    MRS 1.9.2 (recommended)

    NOTE:
    • For clusters of version 8.1.1.300 and later, MRS clusters support versions 1.6.*, 1.7.*, 1.8.*, 1.9.*, 2.0.*, 3.0.*, 3.1.*, and later (* indicates a number).
    • For clusters earlier than version 8.1.1.300, MRS clusters support versions 1.6.*, 1.7.*, 1.8.*, 1.9.*, and 2.0.* (* indicates a number).

    Cluster Type

    Analysis Cluster

    Metadata

    Local

  2. Configure hardware parameters and click Next.

    Table 2 Hardware configuration

    Parameter

    Value

    Billing Mode

    Pay-per-use

    AZ

    AZ2

    VPC

    vpc-01

    Subnet

    subnet-01

    Security Group

    Auto create

    EIP

    10.x.x.x

    Enterprise Project

    Default

    Master

    2

    Analysis Core

    3

    Analysis Task

    0

  3. When you have completed the advanced settings based on the following table, click Buy Now and wait for about 15 minutes. The cluster is successfully created.

    Table 3 Advanced configuration

    Parameter

    Value

    Tag

    test01

    Hostname Prefix

    (Optional) Prefix for the name of an ECS or BMS in the cluster.

    Auto Scaling

    Retain the default value.

    Bootstrap Action

    Retain the default value. MRS 3.x does not support this parameter.

    Agency

    Retain the default value.

    Data Disk Encryption

    This function is disabled by default. Retain the default value.

    Alarm

    Retain the default value.

    Rule Name

    Retain the default value.

    Topic Name

    Select a topic.

    Kerberos Authentication

    This parameter is enabled by default.

    Username

    admin

    Password

    This password is used to log in to the cluster management page.

    Confirm Password

    Enter the password of user admin again.

    Login Mode

    Password

    Username

    root

    Password

    This password is used to remotely log in to the ECS.

    Confirm Password

    Enter the password of user root again.

    Secure Communications

    Select Enable.

Preparing the ORC Table Data Source of MRS

  1. Create a product_info.txt file on the local PC, copy the following data to the file, and save the file to the local PC.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good
    205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good!
    300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad.
    310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice
    150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite
    200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality.
    250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time.
    108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy
    450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor
    260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes
    980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small
    98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter.
    150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective
    200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear
    300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good
    100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good.
    350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good
    110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 
    210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good.
    230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good
    

  2. Log in to the OBS console, click Create Bucket, configure the following parameters, and click Create Now.

    Table 4 Bucket parameters

    Parameter

    Value

    Region

    CN-Hong Kong

    Data Redundancy Policy

    Single-AZ Storage

    Bucket Name

    mrs-datasource

    Default Storage Class

    Standard

    Bucket Policy

    Private

    Default Encryption

    Disable

    Direct Reading

    Disable

    Enterprise Project

    Default

    Tags

    N/A

  3. After the bucket is created, click the bucket name and choose Object > Upload Object to upload the product_info.txt file to the OBS bucket.
  4. Switch back to the MRS console and click the name of the created MRS cluster. On the Dashboard page, click the Synchronize button next to IAM User Sync. The synchronization takes about 5 minutes.
  5. Click Nodes and click a master node. On the displayed page, switch to the EIPs tab, click Bind EIP, select an existing EIP, and click OK. If no EIP is available, create one. Record the EIP.
  6. Download the client.

    1. Go back to the MRS cluster page. Click the cluster name. On the Dashboard tab page of the cluster details page, click Access Manager. If a message is displayed indicating that EIP needs to be bound, bind an EIP first.
    2. In the Access MRS Manager dialog box, click OK. You will be redirected to the MRS Manager login page. Enter the username admin and its password for logging in to MRS Manager. The password is the one you entered when creating the MRS cluster.
    3. Choose Cluster > Name of the desired cluster > Dashboard > More > Download Client. The Download Cluster Client dialog box is displayed.

      NOTE:

      To obtain the client of an earlier version, choose Services > Download Client and set Select Client Type to Configuration Files Only.

  7. Determine the active master node.

    1. Use SSH to log in to the preceding node as user root. Run the following command to switch to user omm:

      su - omm

    2. Run the following command to query the active master node. In the command output, the node whose value of HAActive is active is the active master node.

      sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh

  8. Log in to the active master node as user root and update the client configuration of the active management node.

    cd /opt/client

    sh refreshConfig.sh /opt/client Full_path_of_client_configuration_file_package

    In this tutorial, run the following command:

    sh refreshConfig.sh /opt/client /tmp/MRS-client/MRS_Services_Client.tar

  9. Switch to user omm and go to the directory where the Hive client is located.

    su - omm

    cd /opt/client

  10. Create the product_info table whose storage format is TEXTFILE on Hive.

    1. Import environment variables to the /opt/client directory.

      source bigdata_env

    2. Log in to the Hive client.

      beeline

    3. Run the following SQL commands in sequence to create a demo database and the product_info table:
      1
      CREATE DATABASE demo;
      
      1
      USE demo;
      
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      DROP TABLE product_info;
      
      CREATE TABLE product_info 
      (    
          product_price                int            ,
          product_id                   char(30)       ,
          product_time                 date           ,
          product_level                char(10)       ,
          product_name                 varchar(200)   ,
          product_type1                varchar(20)    ,
          product_type2                char(10)       ,
          product_monthly_sales_cnt    int            ,
          product_comment_time         date           ,
          product_comment_num          int        ,
          product_comment_content      varchar(200)                   
      ) 
      row format delimited fields terminated by ',' 
      stored as TEXTFILE;
      

  11. Import the product_info.txt file to Hive.

    1. Switch back to the MRS cluster, click Files > Import Data.
    2. OBS Path: Find the product_info.txt file in the created OBS bucket and click Yes.
    3. HDFS Path: Select /user/hive/warehouse/demo.db/product_info/ and click Yes.
    4. Click OK to import the product_info table data.

  12. Create an ORC table and import data to the table.

    1. Run the following SQL commands to create an ORC table:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      DROP TABLE product_info_orc;
      
      CREATE TABLE product_info_orc
      (    
          product_price                int            ,
          product_id                   char(30)       ,
          product_time                 date           ,
          product_level                char(10)       ,
          product_name                 varchar(200)   ,
          product_type1                varchar(20)    ,
          product_type2                char(10)       ,
          product_monthly_sales_cnt    int            ,
          product_comment_time         date           ,
          product_comment_num          int            ,
          product_comment_content      varchar(200)                   
      ) 
      row format delimited fields terminated by ',' 
      stored as orc;
      
    2. Insert data in the product_info table into the Hive ORC table product_info_orc.
      1
      INSERT INTO product_info_orc select * from product_info;
      
    3. Query whether the data import is successful.
      1
      SELECT * FROM product_info_orc;
      

Creating an MRS Cluster Connection

  1. Log in to the GaussDB(DWS) console and click the created data warehouse cluster. Ensure that the GaussDB(DWS) and MRS clusters are in the same region, AZ, and VPC subnet.
  2. Click the MRS Data Source tab and click Create MRS Cluster Connection.
  3. Select data source mrs_01 created in the previous step, enter the MRS account name admin and its password, and click OK.

Creating a Foreign Server

  1. Use Data Studio to connect to the created GaussDB(DWS) cluster.
  2. Create a user dbuser that has the permission for creating databases.

    1
    CREATE USER dbuser WITH CREATEDB PASSWORD 'password';
    

  1. Switch to user dbuser.

    1
    SET ROLE dbuser PASSWORD 'password';
    

  1. Create a database mydatabase.

    1
    CREATE DATABASE mydatabase;
    

  1. Perform the following steps to switch to database mydatabase:

    1. In the Object Browser window of the Data Studio client, right-click the database connection and select Refresh from the shortcut menu. The new database is displayed.
    2. Right-click the database name mydatabase and select Connect to DB from the shortcut menu.
    3. Right-click the database name mydatabase and select Open Terminal from the shortcut menu. The SQL command window for connecting to a database is displayed. Perform the following steps in the window.

  2. Grant the permission to create external servers to user dbuser. In 8.1.1 and later versions, you also need to grant the permission to use the public mode.

    1
    2
    GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser;
    In GRANT ALL ON SCHEMA public TO dbuser; //8.1.1 and later versions, common users do not have permission on the public mode and need to grant permission. In versions earlier than 8.1.1, you do not need to perform this operation.
    

    The name of FOREIGN DATA WRAPPER must be hdfs_fdw. dbuser indicates the username of CREATE SERVER.

  3. Grant user dbuser the permission for using foreign tables.

    1
    ALTER USER dbuser USEFT;
    

  4. Switch to the Postgres database and query the foreign server automatically created by the system after the MRS data source is created.

    1
    SELECT * FROM pg_foreign_server;
    

    Information similar to the following is displayed:

    1
    2
    3
    4
    5
    6
                         srvname                      | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                     srvoptions
    --------------------------------------------------+----------+--------+---------+------------+--------+---------------------------------------------------------------------------------------------------------------------
     gsmpp_server                                     |       10 |  13673 |         |            |        |
     gsmpp_errorinfo_server                           |       10 |  13678 |         |            |        |
     hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca |    16476 |  13685 |         |            |        | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs}
    (3 rows)
    

  5. Switch to database mydatabase and switch to user dbuser.

    1
    SET ROLE dbuser PASSWORD 'password';
    

  6. Create a foreign server.

    The server name, address, and configuration path must be the same as those in 8.

    1
    2
    3
    4
    5
    6
    7
    CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW 
    OPTIONS 
    (
    address '192.168.1.245:9820,192.168.1.218:9820',   //The intranet IP addresses of the active and standby master nodes on the MRS management plane, which can be used to communicate with GaussDB(DWS).
    hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca',
    type 'hdfs'
    );
    

  7. View the foreign server.

    1
    SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';
    

    The server is successfully created if information similar to the following is displayed:

    1
    2
    3
    4
                         srvname                      | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                     srvoptions
    --------------------------------------------------+----------+--------+---------+------------+--------+---------------------------------------------------------------------------------------------------------------------
     hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca |    16476 |  13685 |         |            |        | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs}
    (1 row)
    

Creating a Foreign Table

  1. Obtain the product_info_orc file path of Hive.

    1. Log in to the MRS console.
    2. Choose Cluster > Active Cluster and click the name of the cluster to be queried to enter the page displaying the cluster's basic information.
    3. Click the Files and click HDFS File List.
    4. Go to the storage directory of the data to be imported to the GaussDB(DWS) cluster and record the path.
      Figure 1 Checking the data storage path on MRS

  2. Create a foreign table. Set SERVER to the name of the external server created in 10 and foldername to the path obtained in 1.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    DROP FOREIGN TABLE IF EXISTS foreign_product_info;
    
    CREATE FOREIGN TABLE foreign_product_info
    (
        product_price                integer        ,
        product_id                   char(30)       ,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                      
    ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca 
    OPTIONS (
    format 'orc', 
    encoding 'utf8',
    foldername '/user/hive/warehouse/demo.db/product_info_orc/'
    ) 
    DISTRIBUTE BY ROUNDROBIN;
    

Importing Data

  1. Create a local table for data import.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    DROP TABLE IF EXISTS product_info;
    CREATE TABLE product_info
    (
        product_price                integer        ,
        product_id                   char(30)       ,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    with (
    orientation = column,
    compression=middle
    ) 
    DISTRIBUTE BY HASH (product_id);
    

  2. Import data to the target table from the foreign table.

    1
    INSERT INTO product_info SELECT * FROM foreign_product_info;
    

  3. Query the import result.

    1
    SELECT * FROM product_info;
    

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback