Help Center> Data Replication Service> Best Practices> Real-Time Synchronization> From On-premises MySQL to GaussDB Distributed
Updated on 2024-07-15 GMT+08:00

From On-premises MySQL to GaussDB Distributed

Description

You can use real-time synchronization of DRS to synchronize on-premises MySQL to Huawei Cloud GaussDB. Full+incremental synchronization can ensure that data is always in sync between the source MySQL and the destination GaussDB.

Problems

  • Enterprise workloads have been growing and evolving fast, and traditional databases lack the scalability needed to keep up. Enterprises need distributed databases.
  • Building a traditional database means purchasing and installing servers, systems, databases, and other software. The O&M is expensive and difficult.
  • Traditional databases are poor in complex queries.
  • It is hard for traditional databases to smoothly synchronize data with no downtime.

Service architecture

Synchronization Principles

A full+incremental synchronization task includes the following operations:

  1. In the full synchronization phase, tables, primary keys, and unique keys are synchronized.
  2. Incremental data extraction is started to ensure that the incremental data generated during full data synchronization is completely extracted to the DRS instance.
  3. A full synchronization is started.
  4. An incremental synchronization is automatically started after the full synchronization is complete. The replay starts from the position where the full synchronization starts.
  5. A comparison task is started after the incremental replay is complete to check the data consistency. Real-time comparison is supported.
  6. Workloads synchronization is started if the data is consistent between the source and destination databases.

Service List

  • Virtual Private Cloud (VPC)
  • GaussDB
  • Data Replication Service (DRS)
  • Data Admin Service (DAS)

Notes on Usage

  • The resource planning in this best practice is for demonstration only. Adjust it as needed.
  • The end-to-end test data in this document is for reference only.
  • Full synchronization is used to synchronize data. Incremental synchronization is used to synchronize data between the source and destination databases in real time.

Prerequisites

  • You have registered with Huawei Cloud and completed account authentication.
  • Your account balance is greater than or equal to $0 USD.
  • You have set up an on-premises MySQL database for testing.
  • You have obtained the IP address, port number, account, and password of the MySQL database to be synchronized.

Resource Planning

Table 1 Resource planning

Category

Subcategory

Planned Value

Remarks

VPC

VPC name

vpc-src-172

Specify a name that is easy to identify.

Region

Test region

For low network latency and quick resource access, select the region nearest to you.

AZ

AZ 3

-

Subnet CIDR block

172.16.0.0/16

Select a subnet with sufficient network resources.

Subnet name

subnet-src-172

Specify a name that is easy to identify.

On-premises MySQL (source database)

Database version

5.7.38

-

Database user

test_info

Specify a database user. The following minimum permissions are required: SELECT, LOCK TABLES, REPLICATION SLAVE and REPLICATION CLIENT.

GaussDB

Instance name

Auto-drs-gaussdbv5-tar-1

Specify a name that is easy to identify.

Database version

GaussDB 1.3 Enterprise Edition

-

Instance type

Distributed (1 CN, 3 DN shards, and 3 replicas)

Select a distributed instance for the test.

Deployment model

Independent

-

Transaction consistency

Strong consistency

-

Shards

3

-

Coordinator nodes

3

-

Storage type

Ultra-high I/O

-

AZ

AZ 2

In this example, a single AZ is select. You are advised to select multiple AZs to improve instance availability in actual use.

Instance specifications

General-enhanced II 8 vCPUs | 64 GB

Small specifications are selected for this test instance. You are advised to configure specifications based on service requirements in actual use.

Storage space

480 GB

A small storage space is selected for this test instance. You are advised to configure the storage space based on service requirements in actual use.

Disk encryption

Disable

In this example, disk encryption is disabled. Enabling disk encryption improves the security of data, but may slightly affect the database read/write performance.

Logging in to the database through DAS

DB engine

GaussDB

-

Database source

GaussDB

Select the GaussDB instance created in this example.

Database name

postgres

-

Username

root

-

Password

-

Password of the root user of the GaussDB instance created in this example

DRS synchronization task

Task name

DRS-test-info

Specify a name that is easy to identify.

Destination database name

test_database_info

Specify a name that is easy to identify. The name must be compatible with the MySQL database name.

Source DB engine

MySQL

-

Destination DB engine

GaussDB

-

Network type

Public network

Public network is used in this example.

Flowchart

Figure 1 shows the main operation flowchart.

Figure 1 Flowchart

Creating a VPC

Create a VPC to prepare network resources for creating a GaussDB instance.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click in the upper left corner of the page and choose Networking > Virtual Private Cloud.

    The VPC console is displayed.

  4. Click Create VPC.

  5. Configure parameters as needed and click Create Now.
  6. Return to the VPC list and check whether the VPC is created.

    If the VPC status becomes available, the VPC has been created.

Creating a Security Group

Create a security group for creating a GaussDB instance.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click in the upper left corner of the page and choose Networking > Virtual Private Cloud.

    The VPC console is displayed.

  4. Choose Access Control > Security Groups.
  5. Click Create Security Group.
  6. Configure parameters as needed.

  7. Click OK.
  8. Return to the security group list and click the security group name.
  9. Click the Inbound Rules tab, and then click Add Rule.

  10. Configure an inbound rule, add the IP address of the source database, and click OK.

Creating a Distributed GaussDB Instance

This section describes how to create a distributed GaussDB instance as the destination database.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Under the service list, choose Databases > GaussDB.
  4. Click Buy DB Instance.
  5. Configure the instance name and basic information.

  6. Configure instance specifications.

    Select small specifications for this test instance. You are advised to configure specifications based on service requirements in actual use.

  7. Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance and configure the database port.

  8. Configure password and other information.

  9. Click Next, confirm the information, and click Submit.
  10. Go to the instance list.

    If the instance status becomes available, the instance has been created.

Constructing Test Data

Before the synchronization, prepare some data types in the source database for verification after the synchronization is complete.

For details about the data types supported by DRS, see MySQL->GaussDB.

Perform the following steps to construct data in the source database:

  1. Use a database connection tool to connect to the source MySQL database based on its IP address.
  2. Construct data in the source database based on data types supported by DRS.

    1. Create a test user.

      create user test_info identified by xxx;

      test_info indicates the user created for the test, and xxx indicates the password of the user.

    2. Create a database named test_info under the user.

      CREATE DATABASE test_info;

    3. Create a table in the test_info database.

      CREATE TABLE `test_info`.`test_table` (

      `id` int NOT NULL,

      `c1` char(10) DEFAULT NULL,

      `c2` varchar(10) DEFAULT NULL,

      `c3` binary(10) DEFAULT NULL,

      `c4` varbinary(10) DEFAULT NULL,

      `c5` tinyblob,

      `c6` mediumblob,

      `c7` longblob,

      `c8` tinytext,

      `c9` text,

      `c10` mediumtext,

      `c11` longtext,

      `c12` enum('1','2','3') DEFAULT NULL,

      `c13` set('1','2','3') DEFAULT NULL,

      `c14` tinyint DEFAULT NULL,

      `c15` smallint DEFAULT NULL,

      `c16` mediumint DEFAULT NULL,

      `c17` bigint DEFAULT NULL,

      `c18` float DEFAULT NULL,

      `c19` double DEFAULT NULL,

      `c20` date DEFAULT NULL,

      `c21` datetime DEFAULT NULL,

      `c22` timestamp,

      `c23` time DEFAULT NULL,

      `c24` year DEFAULT NULL,

      `c25` bit(10) DEFAULT NULL,

      `c26` json DEFAULT NULL,

      `c27` decimal(10,0) DEFAULT NULL,

      `c28` decimal(10,0) DEFAULT NULL,

      PRIMARY KEY (`id`)

      );

    4. Assign permissions to the user.

      GRANT SELECT, LOCK TABLES ON <database>.<table> to test_info;

      GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* to test_info;

      In the preceding command, test_info indicates the user created for this test, <database> indicates the name of the database to be synchronized, and <table> indicates the name of the table to be synchronized. Replace them based on the site requirements.

    5. Insert two rows of data into the table.

      insert into test_info.test_table values (1,'a','b','111','111','tinyblob','mediumblob','longblob','tinytext','text','mediumtext','longtext','1','3',1,2,3,4,1.123,1.1234,'2024-03-08','2024-03-08 08:00:00','2024-03-08 08:00:00','08:00:00','2024','1010','{"a":"b"}',1.23,1.234);

      insert into test_info.test_table values (2,'a','b','111','111','tinyblob','mediumblob','longblob','tinytext','text','mediumtext','longtext','1','3',1,2,3,4,1.123,1.1234,'2024-03-08','2024-03-08 08:00:00','2024-03-08 08:00:00','08:00:00','2024','1010','{"a":"b"}',1.23,1.234);

  3. Create a database in the destination GaussDB instance.

    1. Log in to the management console.
    2. Click in the upper left corner and select a region.
    3. Click in the upper left corner of the page and choose Databases > Data Admin Service.
    4. In the navigation pane on the left, click Development Tool to go to the login list page.
    5. Click Add Login.
    6. On the displayed page, select the DB engine, source database, and target DB instance, enter the login username, password, and description (optional), and enable Collect Metadata Periodically and Show Executed SQL Statements.

      If Collect Metadata Periodically is enabled, select Remember Password.

    1. Click Test Connection to check whether the connection is successful.

      If a message is displayed indicating connection successful, continue with the operation. If a message is displayed indicating connection failed and the failure cause is provided, make modifications according to the error message.

    2. Click OK.
    3. Locate the added instance, click Log In in the Operation column.

    4. Choose SQL Operations > SQL Window on the top menu bar.

    5. Run the following statement to create a database compatible with MySQL:
      test_database_info indicates the database name. Replace it based on the site requirements.
      CREATE DATABASE test_database_info DBCOMPATIBILITY 'mysql';

Performing a Pre-Check

Before creating a task, check whether synchronization conditions are met.

Before synchronization, refer to Precautions.

Creating a DRS Synchronization Task

This section describes how to create a DRS instance and synchronize data from the test_info database in the on-premises MySQL database to the test_database_info database in the GaussDB instance.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.

    Select the region where the destination instance is deployed.

  3. Click the service list icon on the left and choose Databases > Data Replication Service.
  4. In the navigation pane on the left, choose Data Synchronization Management. On the displayed page, click Create Synchronization Task.
  5. Configure synchronization instance information.

    1. Select a region, and project, and enter a task name.

    2. Specify Data Flow, Source DB Engine, Destination DB Engine, Network Type, DRS Task Type, , Destination DB Instance, Synchronization Instance Subnet (optional), Synchronization Mode, Specifications, AZ, and Tags (optional).

    3. Click Create Now.

  6. Configure the source and destination database information.

    1. Enter the IP address, port number, username, and password of the source database.

      Click Test Connection.

    2. Enter the username and password of the destination database.

      Click Test Connection.

    3. Click Next. In the displayed box, read the message carefully and click Agree.

  7. Configure the synchronization task.

    1. Select the object type for full synchronization. If the table structure to be synchronized has not been created in the destination database, select Table structure (the table structure contains primary keys and unique keys) for Synchronization Object Type. Otherwise, deselect Table structure. Select Index for Synchronization Object Type based on the site requirements.

    2. Specify Incremental Conflict Policy.
      • Ignore: The system will ignore the conflicting data and continue the subsequent synchronization process. If you select Ignore, data in the source database may be inconsistent with that in the destination database.
      • Report error: The synchronization task will be stopped and fail. You can view the details in synchronization logs.
      • Overwrite: Conflicting data will be overwritten.

    3. Select the databases and tables of the source database to be synchronized. In this test, select the test_table table from the test_info database.

    4. Locate the database and table, respectively, and click Edit to change the database name and table name.

    5. On the displayed box, enter a new name, for example, DATATYPELIST_After.

      The name cannot include special characters. Otherwise, an error will be reported during SQL statement execution after the synchronization.

    6. Confirm the settings and click Next.

  8. Confirm advanced settings.

    The information on the Advanced Settings page is for confirmation only and cannot be modified. After confirming the information, click Next.

  9. Process data.

    On the Processing Columns tab, select the column to be synchronized and change its name, for example, change c1 to new-line.
    1. Click Edit next to the table to be processed.

    2. Edit the c1 column.

    3. Enter the new name new-line and click Confirm.
    4. Click Next.

  10. Perform a pre-check.

    1. After all settings are complete, perform a pre-check to ensure that the synchronization is successful.
    2. If any check item fails, review the cause and rectify the fault. Then, click Check Again.

    3. If all check items pass the pre-check, click Next.

  11. Confirm the task.

    1. Check that all configured information is correct.

    2. Click Submit. In the display box, select I have read the precautions.
    3. Click Submit.

  12. After the task is submitted, view and manage it.

    After the task is created, return to the task list to view the status of the created task.

Verifying Data After Synchronization

When the task status changes to Incremental, the full synchronization is complete. You can log in GaussDB and view the data synchronization result.

  1. Wait until the synchronization task status becomes Incremental.

  2. Click the task name to go to the Basic Information page.
  3. Verify data consistency.

    1. Choose Synchronization Comparison > Object-Level Comparison to view the database and table synchronization results.

    2. Choose Synchronization Comparison > Data-Level Comparison, click Create Comparison Task, and view the synchronization results of the rows in the table.

  4. Connect to test_database_info in GaussDB using DAS.

    For details about how to connect to an instance through DAS, see Adding Login Information.

  5. Run the following statement to query the full synchronization result:

    SELECT * FROM test_info.datatypelist_after;

    After the schema in MySQL is synchronized, it will be used as the schema in GaussDB. Therefore, it is required to add the schema in the query statement for exact query.

    All data types in the table were successfully synchronized and the data is correct.

  6. Verify incremental synchronization.

    In full+incremental synchronization, after the full synchronization is complete, the data that is written to the source database after the task is created can still be synchronized to the destination database until the task is stopped. The following describes how to synchronize incremental data from the source database to the destination database.
    1. Use a database connection tool to connect to the source MySQL database based on its IP address.
    2. Run the following statement to insert a data record into the source database:

      Insert a data record whose ID is 3.

      insert into test_info.test_table values (3,'a','b','111','111','tinyblob','mediumblob','longblob','tinytext','text','mediumtext','longtext','1','3',1,2,3,4,1.123,1.1234,'2024-03-08','2024-03-08 08:00:00','2024-03-08 08:00:00','08:00:00','2024','1010','{"a":"b"}',1.23,1.234);
    3. Run the following statement in the destination database to query the result:
      SELECT * FROM test_info.datatypelist_after;

      The new data in the source database has been synchronized to the destination database in real time.

  7. Stop the synchronization task.

    After data is completely synchronized to the destination database, stop the synchronization task.
    1. Locate the task and click Stop in the Operation column.
    2. In the display box, click Yes.