Help Center/ GaussDB(DWS)/ Best Practices/ Data Migration/ Using CDM to Migrate Data from Hologres to a GaussDB(DWS) Cluster
Updated on 2025-07-30 GMT+08:00

Using CDM to Migrate Data from Hologres to a GaussDB(DWS) Cluster

This practice demonstrates how to use Cloud Data Migration (CDM) to migrate data from Hologres to GaussDB(DWS).

CDM is an efficient and easy-to-use service for batch data migration. For more information, see Cloud Data Migration.

This practice takes about 90 minutes and uses cloud services such as Virtual Private Cloud (VPC) and Subnet, Elastic IP (EIP), Cloud Data Migration (CDM), and GaussDB(DWS). The basic process is as follows:

  1. Prerequisites
  2. Step 1: Migrating Metadata
  1. Step 2: Migrating Table Data
  2. Step 3: Checking Table Data
Figure 1 Hologres migration

Notes and Constraints

  • If there are many tables to migrate, it is recommended to perform the migration in batches. You can batch by service or by table data volume.
  • If DELETE or UPDATE operations occur during CDM migration, data consistency cannot be guaranteed afterward. Re-migration will be required in such cases.
  • For large table data, migrate the data in slices.
  • A single database migration job can migrate only one database at a time. To migrate multiple databases, you need to configure multiple migration jobs.

Prerequisites

  • You have purchased GaussDB(DWS) and CDM clusters. For details, see CDM User Guide.
  • The Hologres cluster and GaussDB(DWS) cluster can communicate with CDM. In this example, GaussDB(DWS) and CDM are created in the same region, private cloud, and subnet.
  • You have the migration permission.
  • The source and destination clients have been installed.
  • The migration tools listed in Table 1 have been prepared: DSC and DataCheck.
  • The runtime environment of DataCheck meets the following requirements:
    • The server is compatible with 64-bit operating systems and can run on either Linux or Windows.
    • Either JDK or JRE 1.8 has been installed on the system.
    • The server where DataCheck is installed and running can communicate with the database to be connected.
Table 1 Tools for Hologres migration

Tool

Description

How to Obtain

DSC

Syntax migration tool for GaussDB(DWS)

Obtain the download link.

DataCheck

Data check tool

Contact technical support.

Step 1: Migrating Metadata

  1. Query user roles and permissions in Hologres:

    1
    2
    SELECT ROLNAME FROM pg_roles;
    SELECT user_display_name(ROLNAME) FROM pg_roles;
    

  2. In GaussDB(DWS), the separation of permissions is disabled by default after cluster creation. Database system administrators have the same permissions as object owners. By default, only the object owner or system administrator can query, modify, or destroy the object. Based on the roles and permissions queried in Hologres, create corresponding roles and permissions in GaussDB(DWS) and grant user permissions accordingly:

    • Use GRANT statements to grant object permissions to the target user.
      1
      2
      GRANT USAGE ON SCHEMA schema TO user;
      GRANT SELECT ON TABLE schema.table To user;
      
    • Enable the user to inherit the object permissions of the role.
      1
      2
      CREATE ROLE role_name WITH CREATEDB PASSWORD '*******';
      GRANT role_name to user;
      

  3. Export the source syntax. Exporting the source syntax, which represents the implementation logic of customer's services, from Hologres and modifying it to be compatible with GaussDB(DWS) can reduce the modeling workload and improve service migration efficiency.

    Export all syntax:

    1
    SELECT hg_dump_script('schema_name.table_name');
    
    • Since the source syntax involves the identification of the service scope, operations require a DBA familiar with the service. It is recommended that the source syntax be provided by the customer's DBA.
    • To export data in batches, you can use UNION ALL to associate all tables to be queried. The syntax format is as follows:
      1
      2
      3
      4
      SELECT hg_dump_script('schema_name.table_name')
      UNION ALL
      SELECT hg_dump_script('schema_name.table_name')
      ...
      
    • If the execution fails, use the command below to create an extension in the database, and then execute the preceding SQL statements.
      1
      CREATE EXTENSION hg_toolkit;
      

  4. Connect to GaussDB(DWS) and execute the SQL statement below to create a database. You are advised to use the MySQL-compatible mode to create the database.

    1
    CREATE DATABASE tldg WITH ENCODING 'UTF-8' TEMPLATE template0 DBCOMPATIBILITY 'MYSQL';
    

  5. Use the DSC tool to convert the DDL syntax.

    1. Unzip the DSC tool package obtained in Prerequisites.
    2. Save the DDL syntax files to be converted into the input folder of DSC.
      Figure 2 input directory
    3. Open the command line tool and double-click runDSC.bat in Windows. Run runDSC.sh in Linux.
    4. Convert the syntax:
      1
      runDSC.bat -S Hologres
      
      Figure 3 DDL syntax conversion
    5. View the conversion result in the output folder.
      Figure 4 DDL conversion result
    6. Connect to GaussDB(DWS), execute the DDL statement converted in the previous step to create a table.

    For more information about DSC, see DSC Tool Guide.

Step 2: Migrating Table Data

CDM supports both table-level and database-level migrations.

  1. Configure the source link for CDM. Since Hologres' table creation syntax is compatible with PostgreSQL, you can simply choose PostgreSQL data sources when configuring the CDM link.

    1. Log in to the CDM console and choose Cluster Management on the left.
    2. If CDM is connected to Hologres through the public network, bind an EIP. For details, see Binding or Unbinding an EIP.
    3. Click Job Management next to the cluster name.
      Figure 5 CDM cluster management page

    4. Before establishing a job link for the first time, install the driver. Choose Links > Driver Management and install the PostgreSQL driver.
    5. After the driver installation, click Create Link on the link management page, select PostgreSQL and then click Next.
    6. Enter the Hologres database information.
      Figure 6 Hologres connection information

    7. Click Test to check connectivity, and then click Save.

  2. Configure the destination link for CDM.

    1. Similarly, choose Job Management > Links and click Create Link.
    2. Select Data Warehouse Service and click Next.
    3. Enter the GaussDB(DWS) database information.
      Figure 7 GaussDB(DWS) connection information

    4. Click Test to check connectivity, and then click Save.

  3. Configure and start a table-level migration job.

    1. Click the Table/File Migration tab. This tab displays single-table migration jobs.
    2. Enter the source and destination information.
    3. Click Next to map fields.
      Figure 8 Mapping fields for table-level migration

    4. Confirm the information and click Next.
    5. On the task configuration page, configure Concurrent Extractors (data extracted concurrently). The default value is 1, but you can increase it. However, it is recommended to keep it at or below 4. Then, click Save and Run.
      After the migration job begins, you can view the status in the job list.
      Figure 9 Job status

  4. Configure and start a database-level migration job.

    1. Click the Entire DB Migration tab and click Create Job.
    2. Enter the source information on the left and the destination information on the right. Click Next.
    3. Select all tables or the tables to migrate, click the right arrow in the middle to move them to the right pane, and then click Next.
    4. Configure job parameters.
      • Concurrent Subjobs: Indicates the number of tables to migrate simultaneously. The default value is 10; it is recommended to set it to a value less than 5.
      • Concurrent Extractors: Indicates data extracted concurrently. The default value is 1, but you can increase it. However, it is recommended to keep it at or below 4.

      Confirm the information and click Save and Run.

    5. Wait until the migration job is complete. Click the job name to view the migration status of each table.

Step 3: Checking Table Data

After the migration, check whether the data on the source and destination databases is consistent using DataCheck.

  1. Download and unzip DataCheck-*.zip, and then go to the DataCheck-* directory to use it. For details about how to use the files in the directory, see Table 2.
  2. Configure the tool package.

    • In Windows:
      Open the dbinfo.properties file in the conf folder and configure it based on your actual needs. The following figure shows the configuration of the Holo source.
      Figure 10 Configuring DataCheck

      You can use the command below in the tool to generate the ciphertext of src.passwd and dws.passwd.

      encryption.bat password

      After the command is executed, an encrypted file is generated in the local bin directory.

    • In Linux:

      The method of generating the ciphertext is similar to that for Windows. The command is sh encryption.sh Password. Other steps are the same.

  3. Check data.

    In Windows:

    1. Open the check.input file, enter the schemas, databases, source table, and destination table to be checked, and fill in Row Range with a data query statement to specify the query scope.
      • After configuring the source database name in the configuration file, the source database name in the check.input file defaults to this. However, if a different source database name is specified in the check.input file, it will take precedence.
      • The Check Strategy offers three levels: high, middle, and low. If unspecified, the default is low.
      • The Check Mode supports statistics (statistical value checks).

      The following figure shows the check_input file for metadata comparison.

      Figure 11 check_input
    2. Run the datacheck.bat command in the bin directory to execute the check tool.

    3. View the generated check result file check_input_result.xlsx.

      The following figure shows the statistical value check.

    In Linux:

    1. Edit and upload the check_input.xlsx file. Refer to the step 1 for Windows.
    2. Run the sh datacheck.sh command to execute the check tool.

    3. View the check result in the check_input_result.xlsx file. (The check result analysis is the same as that for Windows.)

Related Information

Table 2 Description of the DataCheck directory

File or Folder

Description

DataCheck

bin

Saves the entry script of the check tool.

  • Windows version: datacheck.bat
  • Linux version: datacheck.sh

conf

Configuration file, which is used to configure the connection between the source database and the destination database and set log printing.

lib

Stores JAR packages required for running the check tool.

check_input.xlsx

  • Information about the table to be checked, including schema, table, and column names.
  • Check level information and check rules of users. Three check levels are supported: high, middle, and low. The default value is low.

logs

The package does not include this file. Once the check tool runs, it automatically generates this file to log the tool's execution process.

check_input_result.xlsx

The package does not include this file. Once the check tool runs, a check result file will be created in the same location as check_input.xlsx.

Table 3 Basic functions of the data check tool

DataCheck

  • Check data in GaussDB(DWS), MySQL, and PostgreSQL databases.
  • Check common fields, such as numeric, time, and character types.
  • Support three check levels, including high, middle, and low.
  • Check schemas, table names, and column names.
  • Specify the check scope of records. By default, all records are checked.
  • Support various check methods, including count(*), max, min, sum, avg, and sampling details check.
  • Output the check result and related check details.
Table 4 Data check levels

Check Level

Description

Syntax

Low

  • Quantity check
  • Number of records: COUNT(*)

Middle

  • Quantity check
  • Numeric type check
  • Number of records: COUNT(*)
  • Value check: MAX, MIN, SUM, and AVG

High

  • Quantity check
  • Numeric type check
  • Date type check
  • Character type check
  • Number of records: COUNT(*)
  • Value check: MAX, MIN, SUM, and AVG
  • Date check: MAX and MIN
  • Character check: order by limit 1000, which reads the data and checks whether the content is the same