Help Center> GaussDB(DWS)> Best Practices> Data Migration> Practice of Data Interconnection Between Two GaussDB(DWS) Clusters Based on GDS
Updated on 2024-06-07 GMT+08:00

Practice of Data Interconnection Between Two GaussDB(DWS) Clusters Based on GDS

This practice demonstrates how to migrate 15 million rows of data between two GaussDB(DWS) clusters within minutes based on the high concurrency of GDS import and export.

  • This function is supported only by clusters of version 8.1.2 or later.
  • GDS is a high-concurrency import and export tool developed by GaussDB(DWS). For more information, visit GDS Usage Guide.
  • This section describes only the operation practice. For details about GDS interconnection and syntax description, see GDS-based Cross-Cluster Interconnection.

This practice takes about 90 minutes. The cloud services used in this practice are GaussDB(DWS), Elastic Cloud Server (ECS), and Virtual Private Cloud (VPC). The basic process is as follows:

  1. Prerequisites
  2. Step 1: Creating Two GaussDB(DWS) Clusters
  3. Step 2: Preparing Source Data
  4. Step 3: Installing and Starting the GDS Server
  5. Step 4: Implementing Data Interconnection Across GaussDB(DWS) Clusters

Supported Regions

Table 1 describes the regions where OBS data has been uploaded.

Table 1 Regions and OBS bucket names

Region

OBS Bucket

CN North-Beijing1

dws-demo-cn-north-1

CN North-Beijing2

dws-demo-cn-north-2

CN North-Beijing4

dws-demo-cn-north-4

CN North-Ulanqab1

dws-demo-cn-north-9

CN East-Shanghai1

dws-demo-cn-east-3

CN East-Shanghai2

dws-demo-cn-east-2

CN South-Guangzhou

dws-demo-cn-south-1

CN South-Guangzhou-InvitationOnly

dws-demo-cn-south-4

CN-Hong Kong

dws-demo-ap-southeast-1

AP-Singapore

dws-demo-ap-southeast-3

AP-Bangkok

dws-demo-ap-southeast-2

LA-Santiago

dws-demo-la-south-2

AF-Johannesburg

dws-demo-af-south-1

LA-Mexico City1

dws-demo-na-mexico-1

LA-Mexico City2

dws-demo-la-north-2

RU-Moscow2

dws-demo-ru-northwest-2

LA-Sao Paulo1

dws-demo-sa-brazil-1

Constraints

In this practice, two sets of DWS and ECS services are deployed in the same region and VPC to ensure network connectivity.

Prerequisites

  • You have obtained the AK and SK of the account.
  • You have created a VPC and subnet. For details, see Creating a VPC.

Step 1: Creating Two GaussDB(DWS) Clusters

Create two GaussDB(DWS) clusters in the China-Hong Kong region. For details, see Creating a Cluster. Name the two clusters dws-demo01 and dws-demo02.

Step 2: Preparing Source Data

  1. On the cluster management page of the GaussDB(DWS) console, locate the row that contains the dws-demo01 cluster and click Login in the Operation column.

    This practice uses version 8.1.3.x as an example. 8.1.2 and earlier versions do not support this login mode. You can use Data Studio to connect to a cluster. For details, see Using Data Studio to Connect to a Cluster.

  2. Enter the login username dbadmin, the database name gaussdb, and the password of user dbadmin set during GaussDB(DWS) cluster creation. Select Remember Password, enable Collect Metadata Periodically and Show Executed SQL Statements, and click Log In.

    Figure 1 Logging in to GaussDB(DWS)

  3. Click the database name gaussdb and click SQL Window in the upper right corner to access the SQL editor.
  4. Copy the following SQL statements to the SQL window and click Execute SQL to create the test TPC-H table ORDERS.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    CREATE TABLE ORDERS
     ( 
     O_ORDERKEY BIGINT NOT NULL , 
     O_CUSTKEY BIGINT NOT NULL , 
     O_ORDERSTATUS CHAR(1) NOT NULL , 
     O_TOTALPRICE DECIMAL(15,2) NOT NULL , 
     O_ORDERDATE DATE NOT NULL , 
     O_ORDERPRIORITY CHAR(15) NOT NULL , 
     O_CLERK CHAR(15) NOT NULL , 
     O_SHIPPRIORITY BIGINT NOT NULL , 
     O_COMMENT VARCHAR(79) NOT NULL)
     with (orientation = column)
     distribute by hash(O_ORDERKEY)
     PARTITION BY RANGE(O_ORDERDATE)
     ( 
     PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00')
     );
    

  5. Run the SQL statements below to create an OBS foreign table.

    Replace AK and SK with the actual AK and SK of the account. <obs_bucket_name> is obtained from Supported Regions.

    Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    CREATE FOREIGN TABLE ORDERS01
     (
    LIKE orders
     ) 
     SERVER gsmpp_server 
     OPTIONS (
     ENCODING 'utf8',
     LOCATION obs://<obs_bucket_name>/tpch/orders.tbl',
     FORMAT 'text',
     DELIMITER '|',
    ACCESS_KEY 'access_key_value_to_be_replaced',
    SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
     CHUNKSIZE '64',
     IGNORE_EXTRA_DATA 'on'
     );
    

  6. Run the SQL statement below to import data from the OBS foreign table to the source GaussDB(DWS) cluster. The import takes about 2 minutes.

    If an import error occurs, the AK and SK values of the foreign table are incorrect. In this case, run DROP FOREIGN TABLE order01 to delete the foreign table, create a foreign table again, and run the following statement to import data again.

    1
    INSERT INTO orders SELECT * FROM orders01;
    

  7. Repeat the preceding steps to log in to the destination cluster dws-demo02 and run the following SQL statements to create the target table orders.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    CREATE TABLE ORDERS
     ( 
     O_ORDERKEY BIGINT NOT NULL , 
     O_CUSTKEY BIGINT NOT NULL , 
     O_ORDERSTATUS CHAR(1) NOT NULL , 
     O_TOTALPRICE DECIMAL(15,2) NOT NULL , 
     O_ORDERDATE DATE NOT NULL , 
     O_ORDERPRIORITY CHAR(15) NOT NULL , 
     O_CLERK CHAR(15) NOT NULL , 
     O_SHIPPRIORITY BIGINT NOT NULL , 
     O_COMMENT VARCHAR(79) NOT NULL)
     with (orientation = column)
     distribute by hash(O_ORDERKEY)
     PARTITION BY RANGE(O_ORDERDATE)
     ( 
     PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00')
     );
    

Step 3: Installing and Starting the GDS Server

  1. Create an ECS by referring to Purchasing an ECS. Note that the ECS and GaussDB(DWS) instances must be created in the same region and VPC. In this example, the CentOS 7.6 version is selected as the ECS image.
  2. Download the GDS package.

    1. Log in to the GaussDB(DWS) console.
    2. In the navigation pane on the left, click Connections.
    3. Select the GDS client of the target version from the drop-down list of CLI Client.

      Select a version based on the cluster version and the OS where the client is installed.

    4. Click Download.

  3. Use the SFTP tool to upload the downloaded client (for example, dws_client_8.2.x_redhat_x64.zip) to the /opt directory of the ECS.
  4. Log in to the ECS as the root user and run the following commands to go to the /opt directory and decompress the client package.

    1
    2
    cd /opt
    unzip dws_client_8.2.x_redhat_x64.zip
    

  5. Create a GDS user and the user group to which the user belongs. This user is used to start GDS and read source data.

    1
    2
    groupadd gdsgrp
    useradd -g gdsgrp gds_user
    

  6. Change the owner of the GDS package directory and source data file directory to the GDS user.

    1
    2
    chown -R gds_user:gdsgrp /opt/gds/bin
    chown -R gds_user:gdsgrp /opt
    

  7. Switch to user gds.

    1
    su - gds_user
    

  8. Run the following commands to go to the gds directory and execute environment variables.

    1
    2
    cd /opt/gds/bin
    source gds_env
    

  9. Run the following command to start GDS. You can view the private IP address of the ECS on the ECS console.

    1
    /opt/gds/bin/gds -d /opt -p Private IP address of the ECS:5000 -H 0.0.0.0/0 -l /opt/gds/bin/gds_log.txt -D -t 2
    

  10. Enable the network port between the ECS and DWS.

    The GDS server (ECS in this practice) needs to communicate with DWS. The default security group of the ECS does not allow inbound traffic from GDS port 5000 and DWS port 8000. Perform the following steps:

    1. Return to the ECS console and click the ECS name to go to the ECS details page.
    2. Click the Security Groups tab and click Manage Rule.
    3. Choose Inbound Rules and click Add Rule. Set Priority to 1, set Protocol & Port to 5000, and click OK.

    4. Repeat the preceding steps to add an inbound rule of 8000.

Step 4: Implementing Data Interconnection Across GaussDB(DWS) Clusters

  1. Create a server.

    1. Obtain the private IP address of the source GaussDB(DWS) cluster. To do so, go to the DWS console, switch to the cluster management page, and click the source cluster name dws-demo01.
    2. Go to the cluster details page and record the private network IP address.

    3. Switch back to the DWS console and click Log In in the Operation column of the destination cluster dws-demo02. The SQL window is displayed.

      Run the commands below to create a server.

      In the commands, Private network IP address of the source GaussDB(DWS) cluster is obtained in the previous step, Private IP address of the ECS is obtained from the ECS console, and Login password of user dbadmin is set when the GaussDB(DWS) cluster is created.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      CREATE SERVER server_remote FOREIGN DATA WRAPPER GC_FDW OPTIONS
       (
       address 'Private network IP address of the source GaussDB(DWS) cluster:8000',
       dbname 'gaussdb',
       username 'dbadmin',
       password 'Login password of user dbadmin',
       syncsrv 'gsfs://Private IP address of the ECS:5000'
       )
       ;
      

  2. Create a foreign table for interconnection.

    In the SQL window of the destination cluster dws-demo02, run the following statements to create a foreign table for interconnection:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    CREATE FOREIGN TABLE ft_orders
     (
     O_ORDERKEY BIGINT , 
     O_CUSTKEY BIGINT , 
     O_ORDERSTATUS CHAR(1) , 
     O_TOTALPRICE DECIMAL(15,2) , 
     O_ORDERDATE DATE , 
     O_ORDERPRIORITY CHAR(15) , 
     O_CLERK CHAR(15) , 
     O_SHIPPRIORITY BIGINT , 
     O_COMMENT VARCHAR(79) 
    
     ) 
     SERVER server_remote 
     OPTIONS 
     (
     schema_name 'public',
     table_name 'orders',
     encoding 'SQL_ASCII'
     );
    

  3. Import all table data.

    In the SQL window, run the SQL statement below to import full data from the ft_orders foreign table: Wait for about 1 minute.

    1
    INSERT INTO orders SELECT * FROM ft_orders;
    

    Run the following SQL statement to verify that 15 million rows of data are successfully imported.

    1
    SELECT count(*) FROM orders;
    

  4. Import data based on filter criteria.

    1
    INSERT INTO orders SELECT * FROM ft_orders WHERE o_orderkey < '10000000';