Help Center> GaussDB(DWS)> Best Practices> Import and Export> Tutorial: Importing Remote GaussDB(DWS) Data Sources
Updated on 2023-12-28 GMT+08:00

Tutorial: Importing Remote GaussDB(DWS) Data Sources

In the era of big data convergent analysis, GaussDB(DWS) clusters in the same region can communicate with each other. This practice demonstrates how to import data from a remote GaussDB(DWS) cluster to the local GaussDB(DWS) cluster using foreign tables.

The demonstration procedure is as follows: Install the gsql database client on an ECS, connect to GaussDB(DWS) using gsql, and import data from the remote GaussDB(DWS) using a foreign table.

The hybrid data warehouse cluster does not support foreign tables. To perform operations in this tutorial, use a standard data warehouse cluster. When you buy a cluster, make sure to select Standard Data Warehouse as the Product Type.

Preparations

You have registered a Huawei account and enabled Huawei Cloud. The account cannot be in arrears or frozen.

Creating an ECS

For details, see Purchasing an ECS. After purchasing an ECS, log in to the ECS by referring to Logging In to a Linux ECS.

When creating an ECS, ensure that the ECS and the GaussDB(DWS) clusters to be created are in the same VPC subnet and in the same region and AZ . The ECS OS is the same as that of the gsql client or GDS (CentOS 7.6 is used as an example), and the password is used for login.

Creating a Cluster and Downloading the Tool Package

  1. Log in to the Huawei Cloud management console.
  2. Choose Service List > Analytics > Data Warehouse Service. On the page that is displayed, click Create Cluster in the upper right corner.
  3. Configure the parameters according to Table 1.

    Table 1 Software configuration

    Parameter

    Configuration

    Region

    Select CN-Hong Kong .

    NOTE:
    • CN-Hong Kong is used as an example. You can select other regions as required. Ensure that all operations are performed in the same region.
    • Ensure that GaussDB(DWS) and the ECS are in the same region, AZ, and VPC subnet.

    AZ

    AZ2

    Resource

    Standard data warehouse

    Compute Resource

    ECS

    Storage Type

    Cloud SSD

    CPU Architecture

    x86

    Node Flavor

    dws2.m6.4xlarge.8 (16 vCPUs | 128 GB | 2000 GB SSD)

    NOTE:

    If this flavor is sold out, select other AZs or flavors.

    Hot Storage

    100 GB/node

    Nodes

    3

    Cluster Name

    dws-demo01

    Administrator Account

    dbadmin

    Administrator Password

    User-defined password

    Confirm Password

    password

    Database Port

    8000

    VPC

    vpc-default

    Subnet

    subnet-default(192.168.0.0/24)

    NOTICE:

    Ensure that the cluster and the ECS are in the same VPC subnet.

    Security Group

    Automatic creation

    EIP

    Buy now

    Bandwidth

    1 Mbit/s

    Advanced Settings

    Default

  4. Confirm the information, click Next, and then click Submit.
  5. Wait for about 10 minutes. After the cluster is created, click the cluster name to go to the Basic Information page. Choose Network, click a security group name, and verify that a security group rule has been added. In this example, the client IP address is 192.168.0.x (the private network IP address of the ECS where gsql is located is 192.168.0.90). Therefore, you need to add a security group rule in which the IP address is 192.168.0.0/24 and port number is 8000.
  6. Return to the Basic Information tab of the cluster and record the value of Private Network IP Address.

  7. Return to the homepage of the GaussDB(DWS) console. Choose Connections in the navigation pane on the left, select the ECS OS (for example, select Redhat x86_64 for CentOS 7.6), and click Download to save the tool package to the local host. The tool package contains the gsql client and GDS.

  8. Repeat 1 to 6 to create a second GaussDB(DWS) cluster and set its name to dws-demo02.

Preparing Source Data

  1. Create the following three CSV files in the specified directory on the local PC:

    • Data file product_info0.csv
      100,XHDK-A,2017-09-01,A,2017 Shirt Women,red,M,328,2017-09-04,715,good!
      205,KDKE-B,2017-09-01,A,2017 T-shirt Women,pink,L,584,2017-09-05,40,very good!
      300,JODL-X,2017-09-01,A,2017 T-shirt men,red,XL,15,2017-09-03,502,Bad.
      310,QQPX-R,2017-09-02,B,2017 jacket women,red,L,411,2017-09-05,436,It's nice.
      150,ABEF-C,2017-09-03,B,2017 Jeans Women,blue,M,123,2017-09-06,120,good.
    • Data file product_info1.csv
      200,BCQP-E,2017-09-04,B,2017 casual pants men,black,L,997,2017-09-10,301,good quality.
      250,EABE-D,2017-09-10,A,2017 dress women,black,S,841,2017-09-15,299,This dress fits well.
      108,CDXK-F,2017-09-11,A,2017 dress women,red,M,85,2017-09-14,22,It's really amazing to buy.
      450,MMCE-H,2017-09-11,A,2017 jacket women,white,M,114,2017-09-14,22,very good.
      260,OCDA-G,2017-09-12,B,2017 woolen coat women,red,L,2004,2017-09-15,826,Very comfortable.
    • Data file product_info2.csv
      980,"ZKDS-J",2017-09-13,"B","2017 Women's Cotton Clothing","red","M",112,,,
      98,"FKQB-I",2017-09-15,"B","2017 new shoes men","red","M",4345,2017-09-18,5473
      50,"DMQY-K",2017-09-21,"A","2017 pants men","red","37",28,2017-09-25,58,"good","good","good"
      80,"GKLW-l",2017-09-22,"A","2017 Jeans Men","red","39",58,2017-09-25,72,"Very comfortable."
      30,"HWEC-L",2017-09-23,"A","2017 shoes women","red","M",403,2017-09-26,607,"good!"
      40,"IQPD-M",2017-09-24,"B","2017 new pants Women","red","M",35,2017-09-27,52,"very good."
      50,"LPEC-N",2017-09-25,"B","2017 dress Women","red","M",29,2017-09-28,47,"not good at all."
      60,"NQAB-O",2017-09-26,"B","2017 jacket women","red","S",69,2017-09-29,70,"It's beautiful."
      70,"HWNB-P",2017-09-27,"B","2017 jacket women","red","L",30,2017-09-30,55,"I like it so much"
      80,"JKHU-Q",2017-09-29,"C","2017 T-shirt","red","M",90,2017-10-02,82,"very good."

  2. Log in to the created ECS as user root and run the following command to create a data source file directory:

    mkdir -p /input_data

  3. Use a file transfer tool to upload the preceding data files to the /input_data directory of the ECS.

Importing Data Sources Using GDS

  1. Log in to the ECS as user root and use a file transfer tool to upload the downloaded tool package in 7 to the /opt directory.
  2. Decompress the tool package in the /opt directory.

    cd /opt

    unzip dws_client_8.1.x_redhat_x64.zip

  3. Create a GDS user and change the owners of the data source and GDS directories.

    groupadd gdsgrp

    useradd -g gdsgrp gds_user

    chown -R gds_user:gdsgrp /opt/gds

    chown -R gds_user:gdsgrp /input_data

  4. Switch to user gds_user.

    su - gds_user

  5. Import the GDS environment variables.

    This step is required only for 8.1.x or later. For earlier versions, skip this step.

    cd /opt/gds/bin

    source gds_env

  6. Start GDS.

    /opt/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 192.168.0.0/24 -l /opt/gds/gds_log.txt -D

    • -d dir: directory for storing data files that contain data to be imported. This practice uses /input_data/ as an example.
    • -p ip:port: listening IP address and port for GDS. Set this parameter to the private network IP address of the ECS where GDS is installed so that GDS can communicate with GaussDB(DWS). In this example, 192.168.0.90:5000 is used.
    • -H address_string: hosts that are allowed to connect to and use GDS. The value must be in CIDR format. In this example, the network segment of the GaussDB(DWS) private network IP address is used.
    • -l log_file: GDS log directory and log file name. In this example, /opt/gds/gds_log.txt is used.
    • -D: GDS in daemon mode.

  7. Connect to the first GaussDB(DWS) cluster using gsql.

    1. Run the exit command to switch to user root, go to the /opt directory of the ECS, and import the environment variables of gsql.

      exit

      cd /opt

      source gsql_env.sh

    2. Go to the /opt/bin directory and connect to the first GaussDB(DWS) cluster using gsql.

      cd /opt/bin

      gsql -d gaussdb -h 192.168.0.8 -p 8000 -U dbadmin -W password -r

      • -d: name of the connected database. In this example, the default database gaussdb is used.
      • -h: private network IP address of the connected GaussDB(DWS) database queried in 6. In this example, 192.168.0.8 is used.
      • -p: GaussDB(DWS) port. The value is 8000.
      • -U: database administrator. The value defaults to dbadmin.
      • -W: administrator password, which is set during cluster creation in 3. In this example, replace password with your actual password.

  8. Create a common user leo and grant the user the permission for creating foreign tables.

    1
    2
    CREATE USER leo WITH PASSWORD 'password';
    ALTER USER leo USEFT;
    

  9. Switch to user leo and create a GDS foreign table.

    Set LOCATION to the GDS listening IP address and port number obtained in 6, for example, gsfs://192.168.0.90:5000/*.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    SET ROLE leo PASSWORD 'password';
    DROP FOREIGN TABLE IF EXISTS product_info_ext;
    CREATE FOREIGN TABLE product_info_ext
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)              
    ) 
    SERVER gsmpp_server 
    OPTIONS(
    LOCATION 'gsfs://192.168.0.90:5000/*',
    FORMAT 'CSV' ,
    DELIMITER ',',
    ENCODING 'utf8',
    HEADER 'false',
    FILL_MISSING_FIELDS 'true',
    IGNORE_EXTRA_DATA 'true'
    )
    READ ONLY
    LOG INTO product_info_err 
    PER NODE REJECT LIMIT 'unlimited';
    

  10. Create a local table.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    DROP TABLE IF EXISTS product_info;
    CREATE TABLE product_info
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    WITH (
    orientation = column,
    compression=middle
    ) 
    DISTRIBUTE BY hash (product_id);
    

  11. Import data from the GDS foreign table and check whether the data is successfully imported.

    1
    2
    INSERT INTO product_info SELECT * FROM product_info_ext ;
    SELECT count(*) FROM product_info;
    

Importing Remote GaussDB(DWS) Data Using a Foreign Table

  1. Connect to the second cluster on the ECS by referring to 7. Change the connection address to the address of the second cluster. In this example, 192.168.0.86 is used.
  2. Create a common user jim and grant the user the permission for creating foreign tables and servers. The value of FOREIGN DATA WRAPPER is gc_fdws.

    1
    2
    3
    CREATE USER jim WITH PASSWORD 'password';
    ALTER USER jim USEFT;
    GRANT ALL ON FOREIGN DATA WRAPPER gc_fdw TO jim;
    

  3. Switch to user jim and create a server.

    1
    2
    3
    4
    5
    6
    7
    SET ROLE jim PASSWORD 'password';
    CREATE SERVER server_remote FOREIGN DATA WRAPPER gc_fdw OPTIONS  
       (address '192.168.0.8:8000,192.168.0.158:8000' , 
      dbname 'gaussdb',  
      username 'leo',  
      password 'password' 
    );
    
    • address: private network IP addresses and port number of the first cluster obtained in 6. In this example, 192.168.0.8:8000 and 192.168.0.158:8000 are used.
    • dbname: database name of the first connected cluster. In this example, gaussdb is used.
    • username: username of the first connected cluster. In this example, leo is used.
    • password: user password

  4. Create a foreign table.

    The columns and constraints of the foreign table must be consistent with those of the table to be accessed.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    CREATE FOREIGN TABLE region 
    ( 
        product_price                integer      ,
        product_id                   char(30)       ,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)   
    ) 
    SERVER 
        server_remote 
    OPTIONS 
    ( 
        schema_name 'leo', 
        table_name 'product_info', 
        encoding 'utf8' 
    );
    
    • SERVER: name of the server created in the previous step. In this example, server_remote is used.
    • schema_name: schema name of the first cluster to be accessed. In this example, leo is used.
    • table_name: table name of the first cluster to be accessed obtained in 10. In this example, product_info is used.
    • encoding: The value must be the same as that of the first cluster obtained in 9. In this example, utf8 is used.

  5. View the created server and foreign table.

    1
    2
    \des+ server_remote
    \d+ region
    

  6. Create a local table.

    The columns and constraints of the table must be consistent with those of the table to be accessed.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    CREATE TABLE local_region 
    ( 
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)        
    ) 
    
    WITH (
    orientation = column,
    compression=middle
    ) 
    DISTRIBUTE BY hash (product_id);
    

  7. Import data to the local table using the foreign table.

    1
    2
    INSERT INTO local_region SELECT * FROM region;
    SELECT * FROM local_region;
    

  8. Query the foreign table without importing data.

    1
    SELECT * FROM region;