Help Center/ Data Warehouse Service / Best Practices/ Data Migration/ Migrating Data Between DWS Clusters Using GDS
Updated on 2025-09-18 GMT+08:00

Migrating Data Between DWS Clusters Using GDS

This practice demonstrates how to migrate 15 million rows of data between two DWS clusters within minutes based on the high concurrency of GDS import and export.

  • This function is supported only by clusters of version 8.1.2 or later.
  • GDS is a high-concurrency import and export tool developed by DWS. For more information, visit GDS Usage Guide.
  • This section describes only the operation practice. For details about GDS interconnection and syntax description, see GDS-based Cross-Cluster Interconnection.

This practice takes about 90 minutes. The cloud services used in this practice are DWS, Elastic Cloud Server (ECS), and Virtual Private Cloud (VPC). The basic process is as follows:

  1. Prerequisites
  2. Step 1: Creating Two DWS Clusters
  3. Step 2: Preparing Source Data
  4. Step 3: Installing and Starting the GDS Server
  5. Step 4: Implementing Data Interconnection Across DWS Clusters

Supported Regions

Table 1 Regions and OBS bucket names

Region

OBS Bucket

EU-Dublin

dws-demo-eu-west-101

Constraints

In this practice, two sets of DWS and ECS services are deployed in the same region and VPC to ensure network connectivity.

Prerequisites

  • You have sign up for a Huawei ID and enabled Huawei Cloud services. The account cannot be in arrears or frozen.
  • You have obtained the AK and SK of the account.
  • You have created a VPC and subnet. For details, see Creating a VPC.

Step 1: Creating Two DWS Clusters

Create two DWS clusters. For details, see Creating a Cluster. You are advised to create the clusters in the EU-Dublin region. Name the two clusters dws-demo01 and dws-demo02.

Step 2: Preparing Source Data

  1. Log in to the DWS console and choose Clusters from the navigation pane. In the cluster list, locate the cluster dws-demo01 and click Login in the Operation column.

    This login mode is available only for clusters of version 8.1.3.x. For clusters of version 8.1.2 or earlier, you need to use gsql to log in.

  2. After the login is successful, the SQL editor is displayed.
  3. Copy the following SQL statements to the SQL window and click Execute SQL to create the test TPC-H table ORDERS.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    CREATE TABLE ORDERS
     ( 
     O_ORDERKEY BIGINT NOT NULL , 
     O_CUSTKEY BIGINT NOT NULL , 
     O_ORDERSTATUS CHAR(1) NOT NULL , 
     O_TOTALPRICE DECIMAL(15,2) NOT NULL , 
     O_ORDERDATE DATE NOT NULL , 
     O_ORDERPRIORITY CHAR(15) NOT NULL , 
     O_CLERK CHAR(15) NOT NULL , 
     O_SHIPPRIORITY BIGINT NOT NULL , 
     O_COMMENT VARCHAR(79) NOT NULL)
     with (orientation = column)
     distribute by hash(O_ORDERKEY)
     PARTITION BY RANGE(O_ORDERDATE)
     ( 
     PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00')
     );
    

  4. Run the SQL statements below to create an OBS foreign table.

    Replace AK and SK with the actual AK and SK of the account. <obs_bucket_name> is obtained from Supported Regions.

    Hard-coded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    CREATE FOREIGN TABLE ORDERS01
     (
    LIKE orders
     ) 
     SERVER gsmpp_server 
     OPTIONS (
     ENCODING 'utf8',
     LOCATION 'obs://<obs_bucket_name>/tpch/orders.tbl',
     FORMAT 'text',
     DELIMITER '|',
    ACCESS_KEY 'access_key_value_to_be_replaced',
    SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
     CHUNKSIZE '64',
     IGNORE_EXTRA_DATA 'on'
     );
    

  5. Run the SQL statement below to import data from the OBS foreign table to the source DWS cluster. The import takes about 2 minutes.

    If an import error occurs, the AK and SK values of the foreign table are incorrect. In this case, run DROP FOREIGN TABLE order01 to delete the foreign table, create a foreign table again, and run the following statement to import data again.

    1
    INSERT INTO orders SELECT * FROM orders01;
    

  6. Repeat the preceding steps to log in to the destination cluster dws-demo02 and run the following SQL statements to create the target table orders.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    CREATE TABLE ORDERS
     ( 
     O_ORDERKEY BIGINT NOT NULL , 
     O_CUSTKEY BIGINT NOT NULL , 
     O_ORDERSTATUS CHAR(1) NOT NULL , 
     O_TOTALPRICE DECIMAL(15,2) NOT NULL , 
     O_ORDERDATE DATE NOT NULL , 
     O_ORDERPRIORITY CHAR(15) NOT NULL , 
     O_CLERK CHAR(15) NOT NULL , 
     O_SHIPPRIORITY BIGINT NOT NULL , 
     O_COMMENT VARCHAR(79) NOT NULL)
     with (orientation = column)
     distribute by hash(O_ORDERKEY)
     PARTITION BY RANGE(O_ORDERDATE)
     ( 
     PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), 
     PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00')
     );
    

Step 3: Installing and Starting the GDS Server

  1. Create an ECS by referring to Purchasing an ECS. The ECS and DWS must be in the same region and VPC. In this example, the CentOS 7.6 image is used.
  2. Download the GDS package.

    1. Log in to the DWS console.
    2. In the navigation tree on the left, choose Management > Client Connections.
    3. Select the GDS client of the target version from the drop-down list of CLI Client.

      Select a version based on the cluster version and the OS where the client is installed.

      The CPU architecture of the client must be the same as that of the cluster. If the cluster uses the x86 specifications, select the x86 client.

    4. Click Download.

  3. Use the SFTP tool to upload the downloaded client (for example, dws_client_8.2.x_redhat_x64.zip) to the /opt directory of the ECS.
  4. Log in to the ECS as the root user and run the following commands to go to the /opt directory and decompress the client package.

    1
    2
    cd /opt
    unzip dws_client_8.2.x_redhat_x64.zip
    

  5. Create a GDS user and the user group to which the user belongs. This user is used to start GDS and read source data.

    1
    2
    groupadd gdsgrp
    useradd -g gdsgrp gds_user
    

  6. Change the owner of the GDS package directory and source data file directory to the GDS user.

    1
    2
    chown -R gds_user:gdsgrp /opt/gds/bin
    chown -R gds_user:gdsgrp /opt
    

  7. Switch to user gds.

    1
    su - gds_user
    

  8. Run the following commands to go to the gds directory and execute environment variables.

    1
    2
    cd /opt/gds/bin
    source gds_env
    

  9. Run the following command to start GDS. You can view the private IP address of the ECS on the ECS console.

    1
    /opt/gds/bin/gds -d /opt -p Private IP address of the ECS:5000 -H 0.0.0.0/0 -l /opt/gds/bin/gds_log.txt -D -t 2
    

  10. Enable the network port between the ECS and DWS.

    The GDS server (ECS in this practice) needs to communicate with DWS. The default security group of the ECS does not allow inbound traffic from GDS port 5000 and DWS port 8000. Perform the following steps:

    1. Return to the ECS console and click the ECS name to go to the ECS details page.
    2. Click the Security Groups tab and click Manage Rule.
    3. Choose Inbound Rules and click Add Rule. Set Priority to 1, set Protocol & Port to 5000, and click OK.

    4. Repeat the preceding steps to add an inbound rule of 8000.

Step 4: Implementing Data Interconnection Across DWS Clusters

  1. Create a server.

    1. Obtain the private IP address of the source DWS cluster. Specifically, go to the DWS console, choose Dedicated Clusters > Clusters, and click the source cluster name dws-demo01.
    2. Go to the cluster details page and record the private network IP address.

    3. Switch back to the DWS console and click Log In in the Operation column of the destination cluster dws-demo02. The SQL window is displayed.

      Run the commands below to create a server.

      In the commands, Private network IP address of the source DWS cluster is obtained in the previous step, Private IP address of the ECS is obtained from the ECS console, and Login password of user dbadmin is set when the DWS cluster is created.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      CREATE SERVER server_remote FOREIGN DATA WRAPPER GC_FDW OPTIONS
       (
       address 'Private network IP address of the source DWS cluster:8000',
       dbname 'gaussdb',
       username 'dbadmin',
       password 'Login password of user dbadmin',
       syncsrv 'gsfs://Private IP address of the ECS:5000'
       )
       ;
      

  2. Create a foreign table for interconnection.

    In the SQL window of the destination cluster dws-demo02, run the following statements to create a foreign table for interconnection:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    CREATE FOREIGN TABLE ft_orders
     (
     O_ORDERKEY BIGINT , 
     O_CUSTKEY BIGINT , 
     O_ORDERSTATUS CHAR(1) , 
     O_TOTALPRICE DECIMAL(15,2) , 
     O_ORDERDATE DATE , 
     O_ORDERPRIORITY CHAR(15) , 
     O_CLERK CHAR(15) , 
     O_SHIPPRIORITY BIGINT , 
     O_COMMENT VARCHAR(79) 
    
     ) 
     SERVER server_remote 
     OPTIONS 
     (
     schema_name 'public',
     table_name 'orders',
     encoding 'SQL_ASCII'
     );
    

  3. Import all table data.

    In the SQL window, run the SQL statement below to import full data from the ft_orders foreign table: Wait for about 1 minute.

    1
    INSERT INTO orders SELECT * FROM ft_orders;
    

    Run the following SQL statement to verify that 15 million rows of data are successfully imported.

    1
    SELECT count(*) FROM orders;
    

  4. Import data based on filter criteria.

    1
    INSERT INTO orders SELECT * FROM ft_orders WHERE o_orderkey < '10000000';