Migrating Data Between GaussDB(DWS) Clusters Using GDS
This practice demonstrates how to migrate 15 million rows of data between two GaussDB(DWS) clusters within minutes based on the high concurrency of GDS import and export.
- This function is supported only by clusters of version 8.1.2 or later.
- GDS is a high-concurrency import and export tool developed by GaussDB(DWS). For more information, visit GDS Usage Guide.
- This section describes only the operation practice. For details about GDS interconnection and syntax description, see GDS-based Cross-Cluster Interconnection.
This practice takes about 90 minutes. The cloud services used in this practice are GaussDB(DWS), Elastic Cloud Server (ECS), and Virtual Private Cloud (VPC). The basic process is as follows:
- Prerequisites
- Step 1: Creating Two GaussDB(DWS) Clusters
- Step 2: Preparing Source Data
- Step 3: Installing and Starting the GDS Server
- Step 4: Implementing Data Interconnection Across GaussDB(DWS) Clusters
Supported Regions
Table 1 describes the regions where OBS data has been uploaded.
Region |
OBS Bucket |
---|---|
CN North-Beijing1 |
dws-demo-cn-north-1 |
CN North-Beijing2 |
dws-demo-cn-north-2 |
CN North-Beijing4 |
dws-demo-cn-north-4 |
CN North-Ulanqab1 |
dws-demo-cn-north-9 |
CN East-Shanghai1 |
dws-demo-cn-east-3 |
CN East-Shanghai2 |
dws-demo-cn-east-2 |
CN South-Guangzhou |
dws-demo-cn-south-1 |
CN South-Guangzhou-InvitationOnly |
dws-demo-cn-south-4 |
CN-Hong Kong |
dws-demo-ap-southeast-1 |
AP-Singapore |
dws-demo-ap-southeast-3 |
AP-Bangkok |
dws-demo-ap-southeast-2 |
LA-Santiago |
dws-demo-la-south-2 |
AF-Johannesburg |
dws-demo-af-south-1 |
LA-Mexico City1 |
dws-demo-na-mexico-1 |
LA-Mexico City2 |
dws-demo-la-north-2 |
RU-Moscow2 |
dws-demo-ru-northwest-2 |
LA-Sao Paulo1 |
dws-demo-sa-brazil-1 |
Constraints
In this practice, two sets of GaussDB(DWS) and ECS services are deployed in the same region and VPC to ensure network connectivity.
Prerequisites
- You have obtained the AK and SK of the account.
- You have created a VPC and subnet. For details, see Creating a VPC.
Step 1: Creating Two GaussDB(DWS) Clusters
Create two GaussDB(DWS) clusters. For details, see Creating a Cluster. You are advised to create the clusters in the CN-Hong Kong region. Name the two clusters dws-demo01 and dws-demo02.
Step 2: Preparing Source Data
- On the cluster management page of the GaussDB(DWS) console, locate the row that contains the dws-demo01 cluster and click Login in the Operation column.
This practice uses version 8.1.3.x as an example. 8.1.2 and earlier versions do not support this login mode. You can use Data Studio to connect to a cluster. For details, see Using Data Studio to Connect to a Cluster.
- After the login is successful, the SQL editor is displayed.
- Copy the following SQL statements to the SQL window and click Execute SQL to create the test TPC-H table ORDERS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL , O_CUSTKEY BIGINT NOT NULL , O_ORDERSTATUS CHAR(1) NOT NULL , O_TOTALPRICE DECIMAL(15,2) NOT NULL , O_ORDERDATE DATE NOT NULL , O_ORDERPRIORITY CHAR(15) NOT NULL , O_CLERK CHAR(15) NOT NULL , O_SHIPPRIORITY BIGINT NOT NULL , O_COMMENT VARCHAR(79) NOT NULL) with (orientation = column) distribute by hash(O_ORDERKEY) PARTITION BY RANGE(O_ORDERDATE) ( PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00') );
- Run the SQL statements below to create an OBS foreign table.
Replace AK and SK with the actual AK and SK of the account. <obs_bucket_name> is obtained from Supported Regions.
Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
CREATE FOREIGN TABLE ORDERS01 ( LIKE orders ) SERVER gsmpp_server OPTIONS ( ENCODING 'utf8', LOCATION 'obs://<obs_bucket_name>/tpch/orders.tbl', FORMAT 'text', DELIMITER '|', ACCESS_KEY 'access_key_value_to_be_replaced', SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced', CHUNKSIZE '64', IGNORE_EXTRA_DATA 'on' );
- Run the SQL statement below to import data from the OBS foreign table to the source GaussDB(DWS) cluster. The import takes about 2 minutes.
If an import error occurs, the AK and SK values of the foreign table are incorrect. In this case, run DROP FOREIGN TABLE order01 to delete the foreign table, create a foreign table again, and run the following statement to import data again.
1
INSERT INTO orders SELECT * FROM orders01;
- Repeat the preceding steps to log in to the destination cluster dws-demo02 and run the following SQL statements to create the target table orders.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
CREATE TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL , O_CUSTKEY BIGINT NOT NULL , O_ORDERSTATUS CHAR(1) NOT NULL , O_TOTALPRICE DECIMAL(15,2) NOT NULL , O_ORDERDATE DATE NOT NULL , O_ORDERPRIORITY CHAR(15) NOT NULL , O_CLERK CHAR(15) NOT NULL , O_SHIPPRIORITY BIGINT NOT NULL , O_COMMENT VARCHAR(79) NOT NULL) with (orientation = column) distribute by hash(O_ORDERKEY) PARTITION BY RANGE(O_ORDERDATE) ( PARTITION O_ORDERDATE_1 VALUES LESS THAN('1993-01-01 00:00:00'), PARTITION O_ORDERDATE_2 VALUES LESS THAN('1994-01-01 00:00:00'), PARTITION O_ORDERDATE_3 VALUES LESS THAN('1995-01-01 00:00:00'), PARTITION O_ORDERDATE_4 VALUES LESS THAN('1996-01-01 00:00:00'), PARTITION O_ORDERDATE_5 VALUES LESS THAN('1997-01-01 00:00:00'), PARTITION O_ORDERDATE_6 VALUES LESS THAN('1998-01-01 00:00:00'), PARTITION O_ORDERDATE_7 VALUES LESS THAN('1999-01-01 00:00:00') );
Step 3: Installing and Starting the GDS Server
- Create an ECS by referring to Purchasing an ECS. Note that the ECS and GaussDB(DWS) instances must be created in the same region and VPC. In this example, the CentOS 7.6 version is selected as the ECS image.
- Download the GDS package.
- Log in to the GaussDB(DWS) console.
- In the navigation tree on the left, choose Management > Client Connections.
- Select the GDS client of the target version from the drop-down list of CLI Client.
Select a version based on the cluster version and the OS where the client is installed.
- Click Download.
- Use the SFTP tool to upload the downloaded client (for example, dws_client_8.2.x_redhat_x64.zip) to the /opt directory of the ECS.
- Log in to the ECS as the root user and run the following commands to go to the /opt directory and decompress the client package.
1 2
cd /opt unzip dws_client_8.2.x_redhat_x64.zip
- Create a GDS user and the user group to which the user belongs. This user is used to start GDS and read source data.
1 2
groupadd gdsgrp useradd -g gdsgrp gds_user
- Change the owner of the GDS package directory and source data file directory to the GDS user.
1 2
chown -R gds_user:gdsgrp /opt/gds/bin chown -R gds_user:gdsgrp /opt
- Switch to user gds.
1
su - gds_user
- Run the following commands to go to the gds directory and execute environment variables.
1 2
cd /opt/gds/bin source gds_env
- Run the following command to start GDS. You can view the private IP address of the ECS on the ECS console.
1
/opt/gds/bin/gds -d /opt -p Private IP address of the ECS:5000 -H 0.0.0.0/0 -l /opt/gds/bin/gds_log.txt -D -t 2
- Enable the network port between the ECS and GaussDB(DWS).
The GDS server (ECS in this practice) needs to communicate with GaussDB(DWS). The default security group of the ECS does not allow inbound traffic from GDS port 5000 and GaussDB(DWS) port 8000. Perform the following steps:
- Return to the ECS console and click the ECS name to go to the ECS details page.
- Click the Security Groups tab and click Manage Rule.
- Choose Inbound Rules and click Add Rule. Set Priority to 1, set Protocol & Port to 5000, and click OK.
- Repeat the preceding steps to add an inbound rule of 8000.
Step 4: Implementing Data Interconnection Across GaussDB(DWS) Clusters
- Create a server.
- Obtain the private IP address of the source GaussDB(DWS) cluster. Specifically, go to the GaussDB(DWS) console, choose Dedicated Clusters > Clusters, and click the source cluster name dws-demo01.
- Go to the cluster details page and record the private network IP address.
- Switch back to the GaussDB(DWS) console and click Log In in the Operation column of the destination cluster dws-demo02. The SQL window is displayed.
Run the commands below to create a server.
In the commands, Private network IP address of the source GaussDB(DWS) cluster is obtained in the previous step, Private IP address of the ECS is obtained from the ECS console, and Login password of user dbadmin is set when the GaussDB(DWS) cluster is created.
1 2 3 4 5 6 7 8 9
CREATE SERVER server_remote FOREIGN DATA WRAPPER GC_FDW OPTIONS ( address 'Private network IP address of the source GaussDB(DWS) cluster:8000', dbname 'gaussdb', username 'dbadmin', password 'Login password of user dbadmin', syncsrv 'gsfs://Private IP address of the ECS:5000' ) ;
- Create a foreign table for interconnection.
In the SQL window of the destination cluster dws-demo02, run the following statements to create a foreign table for interconnection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
CREATE FOREIGN TABLE ft_orders ( O_ORDERKEY BIGINT , O_CUSTKEY BIGINT , O_ORDERSTATUS CHAR(1) , O_TOTALPRICE DECIMAL(15,2) , O_ORDERDATE DATE , O_ORDERPRIORITY CHAR(15) , O_CLERK CHAR(15) , O_SHIPPRIORITY BIGINT , O_COMMENT VARCHAR(79) ) SERVER server_remote OPTIONS ( schema_name 'public', table_name 'orders', encoding 'SQL_ASCII' );
- Import all table data.
In the SQL window, run the SQL statement below to import full data from the ft_orders foreign table: Wait for about 1 minute.
1
INSERT INTO orders SELECT * FROM ft_orders;
Run the following SQL statement to verify that 15 million rows of data are successfully imported.
1
SELECT count(*) FROM orders;
- Import data based on filter criteria.
1
INSERT INTO orders SELECT * FROM ft_orders WHERE o_orderkey < '10000000';
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot