From GaussDB Primary/Standby to Kafka
Description
This practice describes how to create a DRS synchronization task to synchronize incremental data from GaussDB Primary/Standby to Kafka.
Prerequisites
- You have registered with Huawei Cloud.
- Your account balance is greater than or equal to $0 USD.
- You have logged in to the DRS console.
Service List
- Virtual Private Cloud (VPC)
- GaussDB Primary/Standby
- DMS for Kafka
- Data Replication Service (DRS)
Architecture of a Primary/Standby GaussDB Instance
GaussDB Primary/Standby consists of Operation Manager (OM), Cluster Manager (CM), and Data Node (DN). An application sends a task directly to the DN, and the DN returns the result to the application after processing the task.
Primary/Standby instances are suitable for scenarios with small and stable volumes of data, where data reliability and service availability are extremely important.
DRS synchronization network diagram
In this example, the source is a primary/standby GaussDB instance, and the destination is a DMS for Kafka instance. Incremental data of the source database is synchronized to the destination database through a VPC. The following figure shows the deployment architecture.
Notes on Usage
- The resource planning in this best practice is for demonstration only. Adjust it as needed.
- The test data is for reference only. For more information about DRS, click here.
Resource Planning
Category |
Subcategory |
Planned Value |
Description |
---|---|---|---|
VPC |
VPC name |
vpc-DRStest |
Specify a name that is easy to identify. |
Subnet CIDR block |
10.0.0.0/24 |
Select a subnet with sufficient network resources. |
|
Region |
AP-Singapore |
To achieve lower network latency, select the region nearest to you. |
|
Subnet name |
subnet-drs01 |
Specify a name that is easy to identify. |
|
GaussDB (source database) |
Instance name |
drs-gaussdb-src-1 |
Specify a name that is easy to identify. |
DB engine version |
GaussDB 8.1.0 |
- |
|
DB instance type |
Primary/standby |
Select a proper instance type based on its description. |
|
Storage type |
Ultra-high I/O |
GaussDB supports ultra-high I/O storage with a maximum throughput of 350 MB/s. |
|
Specifications |
General-purpose 4 vCPUs | 16 GB |
Select the specifications as needed. |
|
Kafka (destination database) |
Kafka instance name |
kafka-drs |
Specify a name that is easy to identify. |
Version |
2.3.0 |
- |
|
AZ |
AZ 3 |
You can select one, three, or more AZs. You are advised to create the instance across different AZs to improve service reliability. |
|
Specifications |
c6.2u4g.cluster |
- |
|
Brokers |
3 |
- |
|
Storage space |
High I/O, 200 GB |
The storage space is used to store messages (including replicas). Kafka uses three replicas by default. In addition to storing messages, some space needs to be reserved for storing logs and metadata. |
|
DRS synchronization task |
Synchronization task name |
DRS-GaussDBToKafka |
Specify a name that is easy to identify. |
Source DB engine |
GaussDB Primary/Standby |
- |
|
Destination DB engine |
Kafka |
In this example, the destination database is a Kafka instance. |
|
Network type |
VPC |
When creating a task, select VPN or Direct Connect. |
Flowchart
The following figure shows the process of creating a DRS task and synchronizing the incremental data from a primary/standby GaussDB instance to Kafka.
Creating a VPC
Create a VPC to prepare network resources for creating a GaussDB instance.
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Networking > Virtual Private Cloud.
The VPC console is displayed.
- Click Create VPC.
- Configure parameters as needed and click Create Now.
- Return to the VPC list and check whether the VPC is created.
If the VPC status becomes available, the VPC has been created.
Creating a Security Group
Create a security group for creating a GaussDB instance.
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Networking > Virtual Private Cloud.
The VPC console is displayed.
- In the navigation pane, choose Access Control > Security Groups.
- Click Create Security Group.
- Configure parameters as needed.
- Click OK.
Creating a Primary/Standby GaussDB Instance
The following describes how to create a primary/standby GaussDB instance as the destination database.
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Databases > GaussDB.
- Click Buy DB Instance.
- Configure the instance name and basic information.
- Configure instance specifications.
Select small specifications for this test instance. You are advised to configure specifications based on service requirements in actual use.
- Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance and configure the database port.
- Configure password and other information.
- Click Next, confirm the information, and click Submit.
- Go to the instance list.
If the instance status becomes available, the instance has been created.
Constructing Test Data for the GaussDB Instance
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Databases > GaussDB.
- Select the target GaussDB instance and choose More > Log In.
- In the displayed dialog box, enter the password and click Test Connection.
- After the connection is successful, click Log In.
- Click Create Database to create the db_test database.
- Run the following statement in db_test to create table schema_test.table1:
create table schema_test.table1(c1 int primary key,c2 varchar(10),c3 TIMESTAMP(6));
Creating a Kafka Instance
The following describes how to create a Kafka instance.
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Middleware > Distributed Message Service (for Kafka).
- Click Buy Instance.
- Select the instance region and AZ.
- Configure the instance name and specifications.
- Select the storage space and capacity threshold policy.
- Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance.
- Configure the instance password.
- Click Buy, confirm the information, and click Submit.
- Go to the instance list.
If the status of the Kafka instance becomes Running, the instance has been created.
Creating a Kafka Topic
- Click a Kafka instance.
- Choose Topics, and click Create Topic.
- In the dialog box that is displayed, enter a topic name, specify other parameters, and click OK.
Creating a DRS Synchronization Task
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Databases > Data Replication Service.
- Choose Data Synchronization Management and click Create Synchronization Task.
- Configure synchronization task parameters.
- Specify a synchronization task name.
- Select the source database, destination database, and network information.
Select the GaussDB instance created in Creating a Primary/Standby GaussDB Instance as the source database.
- Select specifications and AZ.
- Click Create Now.
The synchronization instance is being created. It takes about 5 to 10 minutes.
- Configure source and destination database information.
- Configure source database information.
- Click Test Connection.
If a successful test message is returned, the database is connected.
- Select the VPC and subnet where the destination database is located, and enter the Kafka IP address and port number.
- Click Test Connection.
If a successful test message is returned, the database is connected.
- Click Next.
- Select the synchronization information, policy, message format, and object, and the format of the message sent to the Kafka.
The following table lists the settings.
Table 2 Synchronization settings Type
Setting
Topic Synchronization Policy
Deliver the content to a topic named testTopic.
Different topic policies correspond to different Kafka partition policies. For details, see Topic and Partition Synchronization Policies.
Synchronize Topic To
Partition 0
Different topic policies correspond to different Kafka partition policies. For details, see Topic and Partition Synchronization Policies.
Data Format in Kafka
You can select the JSON format. For details, see Kafka Message Format.
Synchronization Object
Select schema_test.table1 in the db_test database.
- Click Next and wait for the check results.
- If the check is complete and the check success rate is 100%, click Next.
- After confirming that the synchronization task information is correct, click Next.
Return to the Data Synchronization Management page and check the synchronization task status.
It takes several minutes to complete.
If the status changes to Incremental, the synchronization task has been started.
- In this example, Synchronization Mode is set to Incremental for the task from GaussDB Primary/Standby to Kafka. After the task is started, the status is Incremental.
- If you create a full+incremental synchronization task, a full synchronization is executed first. After the full synchronization is complete, an incremental synchronization starts.
- During the incremental synchronization, data is continuously synchronized, so the task will not automatically stop.
Confirming the Results
In this practice, DRS continuously synchronizes the incremental data generated in the source database to the destination database until you stop the task. The following describes how to verify the synchronization results by inserting data to the source GaussDB database and viewing the data received by Kafka.
- Log in to the management console.
- Click in the upper left corner and select a region.
- Click the service list icon on the left and choose Databases > GaussDB.
- Locate the required GaussDB instance and choose More > Log In.
- In the displayed dialog box, enter the password and click Test Connection.
- After the connection is successful, click Log In.
- Run the following statement to insert data to the db_test.schema_test.table1 table.
insert into schema_test.table1 values(1,'testKafka',current_timestamp(6)); update schema_test.table1 set c2 ='G2K' where c1 =1; delete schema_test.table1 where c1 =1;
- On the Kafka client, check the received data in JSON format.
./kafka-console-consumer.sh --bootstrap-server ip:port --topic testTopic --from-beginning
- Stop the synchronization task.
If all data has been synchronized to the destination database, you can stop the current task.
- Locate the task and click Stop in the Operation column.
- In the display box, click Yes.
Topic and Partition Synchronization Policies
Topic Policy |
Available Partition Policies |
Description |
---|---|---|
A specified topic Select A specified topic if the data volume of the source database is small. |
Partitions are differentiated by the hash values of database_name.schema_name.table_name |
This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved. |
Partition 0 |
If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted. |
|
Partitions are identified by the hash values of the primary key |
This mode applies to scenarios where one table corresponds to one topic, preventing table data from being written to the same partition, so that consumers can obtain data from different partitions concurrently. |
|
Partitions are differentiated by the hash values of database_name.schema_name |
This mode applies to scenarios where one database corresponds to one topic, preventing data of multiple schemas from being written to the same partition, so that consumers can obtain data from different partitions concurrently. |
|
Partitions are differentiated by the hash values of non-primary-key columns |
This mode is for when one table corresponds to one topic, preventing table data from being written to the same partition. Users can customize message keys based on the hash values of non-primary-key columns, and consumers can obtain data from different partitions concurrently. |
|
Automatically generated using the database_name-schema_name-table_name format If each table contains a lot of data, select Automatically generated using the database_name-schema_name-table_name format. |
Partition 0 |
If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted. |
Partitions are identified by the hash values of the primary key |
This mode applies to scenarios where one table corresponds to one topic, preventing table data from being written to the same partition, so that consumers can obtain data from different partitions concurrently. |
|
Partitions are differentiated by the hash values of non-primary-key columns |
This mode is for when one table corresponds to one topic, preventing table data from being written to the same partition. Users can customize message keys based on the hash values of non-primary-key columns, and consumers can obtain data from different partitions concurrently. |
|
Automatically generated based on the database name If the source database does not contain a lot of data, select Automatically generated based on the database name. |
Partitions are differentiated by the hash values of database_name.schema_name.table_name |
This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved. |
Partition 0 |
If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted. |
|
Partitions are differentiated by the hash values of database_name.schema_name |
This mode applies to scenarios where one database corresponds to one topic, preventing data of multiple schemas from being written to the same partition, so that consumers can obtain data from different partitions concurrently. |
|
Automatically generated using the database_name-schema_name format If each schema contains a lot of data, select Automatically generated using the database_name-schema_name format. |
Partitions are differentiated by the hash values of database_name.schema_name.table_name |
This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved. |
Partition 0 |
If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot