Help Center/ Data Replication Service/ Best Practices/ Real-Time Synchronization/ From GaussDB Centralized to Kafka

Updated on 2025-02-13 GMT+08:00

View PDF

From GaussDB Centralized to Kafka

Description

This practice describes how to create a DRS synchronization task to synchronize incremental data from GaussDB Centralized to Kafka.

Prerequisites

You have registered with Huawei Cloud.
Your account balance is greater than or equal to $0 USD.

You have logged in to the DRS console.

Service List

Virtual Private Cloud (VPC)
GaussDB Centralized

DMS for Kafka
Data Replication Service (DRS)

Architecture of a Primary/Standby GaussDB Instance

Figure 1 Logical architecture of a primary/standby GaussDB instance

GaussDB Centralized consists of Operation Manager (OM), Cluster Manager (CM), and Data Node (DN). An application sends a task directly to the DN, and the DN returns the result to the application after processing the task.

Primary/Standby instances are suitable for scenarios with small and stable volumes of data, where data reliability and service availability are extremely important.

DRS synchronization network diagram

In this example, the source is a centralized GaussDB instance, and the destination is a DMS for Kafka instance. Incremental data of the source database is synchronized to the destination database through a VPC. The following figure shows the deployment architecture.

Figure 2 Data synchronization from GaussDB Centralized to Kafka through a VPC

Notes on Usage

The resource planning in this best practice is for demonstration only. Adjust it as needed.
The test data is for reference only. For more information about DRS, click here.

Resource Planning

**Table 1** Resource planning
Category	Subcategory	Planned Value	Description
VPC	VPC name	vpc-DRStest	Specify a name that is easy to identify.
	Subnet CIDR block	10.0.0.0/24	Select a subnet with sufficient network resources.
	Region	AP-Singapore	To achieve lower network latency, select the region nearest to you.
	Subnet name	subnet-drs01	Specify a name that is easy to identify.
GaussDB (source database)	Instance name	drs-gaussdb-src-1	Specify a name that is easy to identify.
	DB engine version	GaussDB 8.1.0	-
	DB instance type	Primary/standby	Select a proper instance type based on its description.
	Storage type	Ultra-high I/O	GaussDB supports ultra-high I/O storage with a maximum throughput of 350 MB/s.
	Specifications	General-purpose 4 vCPUs \| 16 GB	Select the specifications as needed.
Kafka (destination database)	Kafka instance name	kafka-drs	Specify a name that is easy to identify.
	Version	2.3.0	-
	AZ	AZ 3	You can select one, three, or more AZs. You are advised to create the instance across different AZs to improve service reliability.
	Specifications	c6.2u4g.cluster	-
	Brokers	3	-
	Storage space	High I/O, 200 GB	The storage space is used to store messages (including replicas). Kafka uses three replicas by default. In addition to storing messages, some space needs to be reserved for storing logs and metadata.
DRS synchronization task	Synchronization task name	DRS-GaussDBToKafka	Specify a name that is easy to identify.
	Source DB engine	GaussDB Centralized	-
	Destination DB engine	Kafka	In this example, the destination database is a Kafka instance.
	Network type	VPC	When creating a task, select VPN or Direct Connect.

Flowchart

The following figure shows the process of creating a DRS task and synchronizing the incremental data from a centralized GaussDB instance to Kafka.

Creating a VPC

Create a VPC to prepare network resources for creating a GaussDB instance.

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Networking > Virtual Private Cloud.

The VPC console is displayed.
Click Create VPC.
Configure parameters as needed and click Create Now.
Return to the VPC list and check whether the VPC is created.

If the VPC status becomes available, the VPC has been created.

Creating a Security Group

Create a security group for creating a GaussDB instance.

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Networking > Virtual Private Cloud.

The VPC console is displayed.
In the navigation pane, choose Access Control > Security Groups.
Click Create Security Group.
Configure parameters as needed.
Click OK.

Creating a Primary/Standby GaussDB Instance

The following describes how to create a primary/standby GaussDB instance as the destination database.

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Databases > GaussDB.
Click Buy DB Instance.
Configure the instance name and basic information.
Configure instance specifications.

Select small specifications for this test instance. You are advised to configure specifications based on service requirements in actual use.
Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance and configure the database port.
Configure password and other information.
Click Next, confirm the information, and click Submit.
Go to the instance list.

If the instance status becomes available, the instance has been created.

Constructing Test Data for the GaussDB Instance

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Databases > GaussDB.
Select the target GaussDB instance and choose More > Log In.
In the displayed dialog box, enter the password and click Test Connection.
After the connection is successful, click Log In.
Click Create Database to create the db_test database.

Run the following statement in db_test to create table schema_test.table1:

create table schema_test.table1(c1 int primary key,c2 varchar(10),c3 TIMESTAMP(6));

Creating a Kafka Instance

The following describes how to create a Kafka instance.

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Middleware > Distributed Message Service (for Kafka).
Click Buy Instance.
Select the instance region and AZ.
Configure the instance name and specifications.
Select the storage space and capacity threshold policy.
Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance.
Configure the instance password.
Click Buy, confirm the information, and click Submit.
Go to the instance list.

If the status of the Kafka instance becomes Running, the instance has been created.

Creating a Kafka Topic

Click a Kafka instance.
Choose Topics, and click Create Topic.
In the dialog box that is displayed, enter a topic name, specify other parameters, and click OK.

Creating a DRS Synchronization Task

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Databases > Data Replication Service.
Choose Data Synchronization Management and click Create Synchronization Task.
Configure synchronization task parameters.
1. Specify a synchronization task name.
2. Select the source database, destination database, and network information.
  Select the GaussDB instance created in Creating a Primary/Standby GaussDB Instance as the source database.
  
  Figure 3 Synchronization instance details
3. Select specifications and AZ.
Click Create Now.

The synchronization instance is being created. It takes about 5 to 10 minutes.
Configure source and destination database information.
1. Configure source database information.
2. Click Test Connection.
  If a successful test message is returned, the database is connected.
3. Select the VPC and subnet where the destination database is located, and enter the Kafka IP address and port number.
4. Click Test Connection.
  If a successful test message is returned, the database is connected.
Click Next.

Select the synchronization information, policy, message format, and object, and the format of the message sent to the Kafka.

The following table lists the settings.

**Table 2** Synchronization settings
Type	Setting
Topic Synchronization Policy	Deliver the content to a topic named testTopic. Different topic policies correspond to different Kafka partition policies. For details, see Topic and Partition Synchronization Policies.
Synchronize Topic To	Partition 0 Different topic policies correspond to different Kafka partition policies. For details, see Topic and Partition Synchronization Policies.
Data Format in Kafka	You can select the JSON format. For details, see Kafka Message Format.
Synchronization Object	Select schema_test.table1 in the db_test database.

Click Next and wait for the check results.
If the check is complete and the check success rate is 100%, click Next.
After confirming that the synchronization task information is correct, click Next.

Return to the Data Synchronization Management page and check the synchronization task status.

It takes several minutes to complete.

If the status changes to Incremental, the synchronization task has been started.
- In this example, Synchronization Mode is set to Incremental for the task from GaussDB Primary/Standby to Kafka. After the task is started, the status is Incremental.
- If you create a full+incremental synchronization task, a full synchronization is executed first. After the full synchronization is complete, an incremental synchronization starts.
- During the incremental synchronization, data is continuously synchronized, so the task will not automatically stop.

Confirming the Results

In this practice, DRS continuously synchronizes the incremental data generated in the source database to the destination database until you stop the task. The following describes how to verify the synchronization results by inserting data to the source GaussDB database and viewing the data received by Kafka.

Log in to the management console.
Click in the upper left corner and select a region.
Click the service list icon on the left and choose Databases > GaussDB.
Locate the required GaussDB instance and choose More > Log In.
In the displayed dialog box, enter the password and click Test Connection.
After the connection is successful, click Log In.

Run the following statement to insert data to the db_test.schema_test.table1 table.

insert into schema_test.table1 values(1,'testKafka',current_timestamp(6)); 
update schema_test.table1 set c2 ='G2K' where c1 =1;
delete schema_test.table1 where  c1 =1;

On the Kafka client, check the received data in JSON format.

 ./kafka-console-consumer.sh --bootstrap-server ip:port --topic  testTopic   --from-beginning

Stop the synchronization task.
If all data has been synchronized to the destination database, you can stop the current task.
1. Locate the task and click Stop in the Operation column.
2. In the display box, click Yes.

Topic and Partition Synchronization Policies

**Table 3** Topic and partition policies
Topic Policy	Available Partition Policies	Description
A specified topic Select A specified topic if the data volume of the source database is small.	Partitions are differentiated by the hash values of database_name.schema_name.table_name	This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved.
	Partition 0	If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.
	Partitions are identified by the hash values of the primary key	This mode applies to scenarios where one table corresponds to one topic, preventing table data from being written to the same partition, so that consumers can obtain data from different partitions concurrently.
	Partitions are differentiated by the hash values of database_name.schema_name	This mode applies to scenarios where one database corresponds to one topic, preventing data of multiple schemas from being written to the same partition, so that consumers can obtain data from different partitions concurrently.
	Partitions are differentiated by the hash values of non-primary-key columns	This mode is for when one table corresponds to one topic, preventing table data from being written to the same partition. Users can customize message keys based on the hash values of non-primary-key columns, and consumers can obtain data from different partitions concurrently.
Automatically generated using the database_name-schema_name-table_name format If each table contains a lot of data, select Automatically generated using the database_name-schema_name-table_name format.	Partition 0	If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.
	Partitions are identified by the hash values of the primary key	This mode applies to scenarios where one table corresponds to one topic, preventing table data from being written to the same partition, so that consumers can obtain data from different partitions concurrently.
	Partitions are differentiated by the hash values of non-primary-key columns	This mode is for when one table corresponds to one topic, preventing table data from being written to the same partition. Users can customize message keys based on the hash values of non-primary-key columns, and consumers can obtain data from different partitions concurrently.
Automatically generated based on the database name If the source database does not contain a lot of data, select Automatically generated based on the database name.	Partitions are differentiated by the hash values of database_name.schema_name.table_name	This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved.
	Partition 0	If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.
	Partitions are differentiated by the hash values of database_name.schema_name	This mode applies to scenarios where one database corresponds to one topic, preventing data of multiple schemas from being written to the same partition, so that consumers can obtain data from different partitions concurrently.
Automatically generated using the database_name-schema_name format If each schema contains a lot of data, select Automatically generated using the database_name-schema_name format.	Partitions are differentiated by the hash values of database_name.schema_name.table_name	This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved.
	Partition 0	If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.