Migrating Data from GaussDB(DWS) to DLI

Updated on 2024-05-29 GMT+08:00

View PDF

This section describes how to use the CDM data synchronization function to migrate data from GaussDB(DWS) to DLI.

Prerequisites

You have created a DLI SQL queue.
CAUTION:

When you create a queue, set its Type to For SQL.
You have created a GaussDB(DWS) cluster.
You have created a CDM cluster. For details about how to create a CDM cluster, see Creating a CDM Cluster.
NOTE:
- If the destination data source is an on-premises database, you need the Internet or Direct Connect. When using the Internet, ensure that an EIP has been bound to the CDM cluster, the security group of CDM allows outbound traffic from the host where the off-cloud data source is located, the host where the data source is located can access the Internet, and the connection port has been enabled in the firewall rules.
- If the data source is GaussDB(DWS) or MRS on a cloud, the network must meet the following requirements:
  i. If the CDM cluster and the cloud service are in different regions, a public network or a dedicated connection is required for enabling communication between the CDM cluster and the cloud service. If the Internet is used for communication, ensure that an EIP has been bound to the CDM cluster, the host where the data source is located can access the Internet, and the port has been enabled in the firewall rules.
  
  ii. If the CDM cluster and the cloud service are in the same region, VPC, subnet, and security group, they can communicate with each other by default. If the CDM cluster and the cloud service are in the same VPC but in different subnets or security groups, you must configure routing rules and security group rules.
  
  iii. The cloud service instance and the CDM cluster belong to the same enterprise project. If they do not, you can modify the enterprise project of the workspace.
In this example, the VPC, subnet, and security group of the CDM cluster are the same as those of the GaussDB(DWS) cluster.

Step 1: Prepare Data

Create a database and table in the GaussDB(DWS) cluster.
1. Connect to the existing GaussDB(DWS) cluster by referring to Using the gsql CLI Client to Connect to a Cluster.
2. Connect to the default database gaussdb of a GaussDB(DWS) cluster.
```
gsql -d gaussdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
```
  - gaussdb: Default database of the GaussDB(DWS) cluster
  - Connection address of the DWS cluster: If a public network address is used for connection, set this parameter to Public Network Address or Public Network Access Domain Name. If a private network address is used for connection, set this parameter to Private Network Address or Private Network Access Domain Name. If an ELB is used for connection, set this parameter to the ELB address.
  - dbadmin: Default administrator username used during cluster creation
  - -W: Default password of the administrator
3. Run the following command to create the testdwsdb database:
```
CREATE DATABASE testdwsdb;
```
4. Run the following command to exit the gaussdb database and connect to testdwsdb:
```
\q
gsql -d testdwsdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
```
5. Run the following commands to create a table and import data to the table.
  Run the following command to create a table:
```
CREATE TABLE table1(id int, a char(6), b varchar(6),c varchar(6)) ;
```
  Run the following statements to insert data into the table:
```
INSERT INTO table1 VALUES(1,'123','456','789');
INSERT INTO table1 VALUES(2,'abc','efg','hif');
```
6. Query the table data to verify that the data is inserted.
```
select * from table1;
```
  Figure 1 Querying data in the table
Create a database and table on DLI.
1. Log in to the DLI management console and click SQL Editor. On the displayed page, set Engine to spark and Queue to the created SQL queue.
  Enter the following statement in the editing window to create a database, for example, the migrated DLI database testdb:
```
create database testdb;
```
2. In SQL Editor, select testdb for Database and run the following table creation statement to create a table in the database:
```
create table tabletest(id INT, name1 string, name2 string, name3 string);
```

Step 2: Migrate Data

Create a CDM connection to MRS Hive.

Create a connection to the GaussDB(DWS) database.

Log in to the CDM console, choose Cluster Management. On the displayed page, locate the created CDM cluster, and click Job Management in the Operation column.
On the Job Management page, click the Links tab, and click Create Link. On the displayed page, select Data Warehouse Service and click Next.

Configure the connection. The following table describes the required parameters.

**Table 1** GaussDB(DWS) data source configuration
Parameter	Value
Name	Name of the GaussDB(DWS) data source, for example, source_dws.
Database Server	Click Select next to the text box to select the name of the created GaussDB(DWS) cluster.
Port	Port number of the GaussDB(DWS) database. The default value is 8000.
Database Name	Name of the GaussDB(DWS) database you want to migrate The testdwsdb database created in Create a database and table in the GaussDB(DWS) cluster is used in this example.
Username	Username used for accessing the database. This account must have the permissions required to read and write data tables and metadata. In this example, the default administrator dbadmin specified when you create the GaussDB(DWS) database is used.
Password	Password of the GaussDB(DWS) database user.

Figure 2 Configuring the GaussDB(DWS) connection
Click to enlarge

For other parameters, retain the default values. Click Save to complete the configuration.

Create a connection to the DLI.
1. Log in to the CDM console, choose Cluster Management. On the displayed page, locate the created CDM cluster, and click Job Management in the Operation column.
2. On the Job Management page, click the Links tab, and click Create Link. On the displayed page, select Data Lake Insight and click Next.
  Figure 3 Selecting the DLI connector
1. Create a connection to link CDM to DLI.
  Figure 4 Selecting the DLI connector
  
  After the configuration is complete, click Save.

Create a CDM migration job.

Log in to the CDM console, choose Cluster Management. On the displayed page, locate the created CDM cluster, and click Job Management in the Operation column.
On the Job Management page, choose the Table/File Migration tab and click Create Job.

On the Create Job page, specify job information.

Figure 5 Configuring the migration job
Click to enlarge

Job Name: Name of the data migration job, for example, test

Set parameters required for Source Job Configuration.

**Table 2** Source job configuration parameters
Parameter	Value
Source Link Name	Select the name of the data source created in 1.a.
Use SQL Statement	When Use SQL Statement is set to Yes, enter an SQL statement here. CDM exports data based on the SQL statement. In this example, set this parameter to No.
Schema/Table Space	Name of the schema or tablespace from which data will be extracted. This parameter is displayed when Use SQL Statement is set to No. Click the icon next to the text box to go to the page for selecting a schema or directly enter a schema or tablespace. In this example, no schema is created in Create a database and table in the GaussDB(DWS) cluster. In this case, set this parameter to the default value public. If the desired schema or tablespace is not displayed, check whether the login account has the permissions required to query metadata. NOTE: The parameter value can contain wildcard characters (), which is used to export all databases whose names start with a certain prefix or end with a certain suffix. For example: SCHEMA indicates that all databases whose names starting with SCHEMA are exported. SCHEMA indicates that all databases whose names ending with SCHEMA* are exported. SCHEMA indicates that all databases whose names containing SCHEMA are exported.
Table Name	Name of the table you want to migrate. In this example, table1 created in Create a database and table in the GaussDB(DWS) cluster is used.

Set parameters required for Destination Job Configuration.

**Table 3** Destination job configuration parameters
Parameter	Value
Destination Link Name	Select the DLI data source connection.
Resource Queue	Select a created DLI SQL queue.
Database	Select a created DLI database. In this example, database testdb created in Create a database and table on DLI is selected.
Table	Select the name of a table in the database. In this example, table tabletest created in Create a database and table on DLI is created.
Clear data before import	Whether to clear data in the destination table before data import. In this example, set this parameter to No. If this parameter is set to Yes, data in the destination table will be cleared before the task is started.

For details about parameter settings, see To DLI.

Click Next. The Map Field page is displayed. CDM automatically matches the source and destination fields.
- If the field mapping is incorrect, you can drag the fields to adjust the mapping.
- If the type is automatically created at the migration destination, you need to configure the type and name of each field.
- CDM allows for field conversion during migration.
  Figure 6 Field mapping
Click Next and set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:
- Retry Upon Failure: If the job fails to be executed, you can determine whether to automatically retry. Retain the default value Never.
- Group: Select the group to which the job belongs. The default group is DEFAULT. On the Job Management page, jobs can be displayed, started, or exported by group.
- Scheduled Execution: Retain the default value No.
- Concurrent Extractors: Enter the number of extractors to be concurrently executed. Retain the default value 1.
- Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out during job execution needs to be written to OBS. Before writing dirty data, create an OBS link. You can view the data on OBS later. Retain the default value No so that dirty data is not recorded.
Click Save and Run. On the Job Management page, you can view the job execution progress and result.
Figure 7 Job progress and execution result

Step 3: Query Results

After the migration job is complete, log in to the DLI management console and click SQL Editor. In the displayed page, set Engine to spark, Queue to the created SQL queue, and Database to the database created in Create a database and table on DLI. Execute the following query statement and check whether the table data has been migrated to the tabletest table: