Updated on 2024-07-15 GMT+08:00

From GaussDB Primary/Standby to Kafka

Description

This practice describes how to create a DRS synchronization task to synchronize incremental data from GaussDB Primary/Standby to Kafka.

Prerequisites

  • You have registered with Huawei Cloud.
  • Your account balance is greater than or equal to $0 USD.
  • You have logged in to the DRS console.

Service List

  • Virtual Private Cloud (VPC)
  • GaussDB Primary/Standby
  • DMS for Kafka
  • Data Replication Service (DRS)

Architecture of a Primary/Standby GaussDB Instance

Figure 1 Logical architecture of a primary/standby GaussDB instance

GaussDB Primary/Standby consists of Operation Manager (OM), Cluster Manager (CM), and Data Node (DN). An application sends a task directly to the DN, and the DN returns the result to the application after processing the task.

Primary/Standby instances are suitable for scenarios with small and stable volumes of data, where data reliability and service availability are extremely important.

DRS synchronization network diagram

In this example, the source is a primary/standby GaussDB instance, and the destination is a DMS for Kafka instance. Incremental data of the source database is synchronized to the destination database through a VPC. The following figure shows the deployment architecture.

Figure 2 Data synchronization from GaussDB Primary/Standby to Kafka through a VPC

Notes on Usage

  • The resource planning in this best practice is for demonstration only. Adjust it as needed.
  • The test data is for reference only. For more information about DRS, click here.

Resource Planning

Table 1 Resource planning

Category

Subcategory

Planned Value

Description

VPC

VPC name

vpc-DRStest

Specify a name that is easy to identify.

Subnet CIDR block

10.0.0.0/24

Select a subnet with sufficient network resources.

Region

AP-Singapore

To achieve lower network latency, select the region nearest to you.

Subnet name

subnet-drs01

Specify a name that is easy to identify.

GaussDB (source database)

Instance name

drs-gaussdb-src-1

Specify a name that is easy to identify.

DB engine version

GaussDB 8.1.0

-

DB instance type

Primary/standby

Select a proper instance type based on its description.

Storage type

Ultra-high I/O

GaussDB supports ultra-high I/O storage with a maximum throughput of 350 MB/s.

Specifications

General-purpose

4 vCPUs | 16 GB

Select the specifications as needed.

Kafka (destination database)

Kafka instance name

kafka-drs

Specify a name that is easy to identify.

Version

2.3.0

-

AZ

AZ 3

You can select one, three, or more AZs. You are advised to create the instance across different AZs to improve service reliability.

Specifications

c6.2u4g.cluster

-

Brokers

3

-

Storage space

High I/O, 200 GB

The storage space is used to store messages (including replicas). Kafka uses three replicas by default. In addition to storing messages, some space needs to be reserved for storing logs and metadata.

DRS synchronization task

Synchronization task name

DRS-GaussDBToKafka

Specify a name that is easy to identify.

Source DB engine

GaussDB Primary/Standby

-

Destination DB engine

Kafka

In this example, the destination database is a Kafka instance.

Network type

VPC

When creating a task, select VPN or Direct Connect.

Flowchart

The following figure shows the process of creating a DRS task and synchronizing the incremental data from a primary/standby GaussDB instance to Kafka.

Creating a VPC

Create a VPC to prepare network resources for creating a GaussDB instance.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Networking > Virtual Private Cloud.

    The VPC console is displayed.

  4. Click Create VPC.

  5. Configure parameters as needed and click Create Now.
  6. Return to the VPC list and check whether the VPC is created.

    If the VPC status becomes available, the VPC has been created.

Creating a Security Group

Create a security group for creating a GaussDB instance.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Networking > Virtual Private Cloud.

    The VPC console is displayed.

  4. In the navigation pane, choose Access Control > Security Groups.
  5. Click Create Security Group.
  6. Configure parameters as needed.

  7. Click OK.

Creating a Primary/Standby GaussDB Instance

The following describes how to create a primary/standby GaussDB instance as the destination database.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Databases > GaussDB.
  4. Click Buy DB Instance.
  5. Configure the instance name and basic information.

  6. Configure instance specifications.

    Select small specifications for this test instance. You are advised to configure specifications based on service requirements in actual use.

  7. Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance and configure the database port.

  8. Configure password and other information.

  9. Click Next, confirm the information, and click Submit.
  10. Go to the instance list.

    If the instance status becomes available, the instance has been created.

Constructing Test Data for the GaussDB Instance

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Databases > GaussDB.
  4. Select the target GaussDB instance and choose More > Log In.
  5. In the displayed dialog box, enter the password and click Test Connection.
  6. After the connection is successful, click Log In.
  7. Click Create Database to create the db_test database.

  8. Run the following statement in db_test to create table schema_test.table1:

    create table schema_test.table1(c1 int primary key,c2 varchar(10),c3 TIMESTAMP(6));

Creating a Kafka Instance

The following describes how to create a Kafka instance.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Middleware > Distributed Message Service (for Kafka).
  4. Click Buy Instance.
  5. Select the instance region and AZ.

  6. Configure the instance name and specifications.

  7. Select the storage space and capacity threshold policy.

  8. Select a VPC and security group (created in Creating a VPC and Creating a Security Group) for the instance.

  9. Configure the instance password.

  10. Click Buy, confirm the information, and click Submit.
  11. Go to the instance list.

    If the status of the Kafka instance becomes Running, the instance has been created.

Creating a Kafka Topic

  1. Click a Kafka instance.
  2. Choose Topics, and click Create Topic.
  3. In the dialog box that is displayed, enter a topic name, specify other parameters, and click OK.

Creating a DRS Synchronization Task

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Databases > Data Replication Service.
  4. Choose Data Synchronization Management and click Create Synchronization Task.
  5. Configure synchronization task parameters.

    1. Specify a synchronization task name.

    2. Select the source database, destination database, and network information.

      Select the GaussDB instance created in Creating a Primary/Standby GaussDB Instance as the source database.

    3. Select specifications and AZ.

  6. Click Create Now.

    The synchronization instance is being created. It takes about 5 to 10 minutes.

  7. Configure source and destination database information.

    1. Configure source database information.
    2. Click Test Connection.

      If a successful test message is returned, the database is connected.

    3. Select the VPC and subnet where the destination database is located, and enter the Kafka IP address and port number.
    4. Click Test Connection.

      If a successful test message is returned, the database is connected.

  8. Click Next.
  9. Select the synchronization information, policy, message format, and object, and the format of the message sent to the Kafka.

    The following table lists the settings.

    Table 2 Synchronization settings

    Type

    Setting

    Topic Synchronization Policy

    Deliver the content to a topic named testTopic.

    Different topic policies correspond to different Kafka partition policies. For details, see Topic and Partition Synchronization Policies.

    Synchronize Topic To

    Partition 0

    Different topic policies correspond to different Kafka partition policies. For details, see Topic and Partition Synchronization Policies.

    Data Format in Kafka

    You can select the JSON format. For details, see Kafka Message Format.

    Synchronization Object

    Select schema_test.table1 in the db_test database.

  10. Click Next and wait for the check results.
  11. If the check is complete and the check success rate is 100%, click Next.
  12. After confirming that the synchronization task information is correct, click Next.

    Return to the Data Synchronization Management page and check the synchronization task status.

    It takes several minutes to complete.

    If the status changes to Incremental, the synchronization task has been started.

    • In this example, Synchronization Mode is set to Incremental for the task from GaussDB Primary/Standby to Kafka. After the task is started, the status is Incremental.
    • If you create a full+incremental synchronization task, a full synchronization is executed first. After the full synchronization is complete, an incremental synchronization starts.
    • During the incremental synchronization, data is continuously synchronized, so the task will not automatically stop.

Confirming the Results

In this practice, DRS continuously synchronizes the incremental data generated in the source database to the destination database until you stop the task. The following describes how to verify the synchronization results by inserting data to the source GaussDB database and viewing the data received by Kafka.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region.
  3. Click the service list icon on the left and choose Databases > GaussDB.
  4. Locate the required GaussDB instance and choose More > Log In.
  5. In the displayed dialog box, enter the password and click Test Connection.
  6. After the connection is successful, click Log In.
  7. Run the following statement to insert data to the db_test.schema_test.table1 table.

    insert into schema_test.table1 values(1,'testKafka',current_timestamp(6)); 
    update schema_test.table1 set c2 ='G2K' where c1 =1;
    delete schema_test.table1 where  c1 =1;

  8. On the Kafka client, check the received data in JSON format.

     ./kafka-console-consumer.sh --bootstrap-server ip:port --topic  testTopic   --from-beginning

  9. Stop the synchronization task.

    If all data has been synchronized to the destination database, you can stop the current task.
    1. Locate the task and click Stop in the Operation column.
    2. In the display box, click Yes.

Topic and Partition Synchronization Policies

Table 3 Topic and partition policies

Topic Policy

Available Partition Policies

Description

A specified topic

Select A specified topic if the data volume of the source database is small.

Partitions are differentiated by the hash values of database_name.schema_name.table_name

This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved.

Partition 0

If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.

Partitions are identified by the hash values of the primary key

This mode applies to scenarios where one table corresponds to one topic, preventing table data from being written to the same partition, so that consumers can obtain data from different partitions concurrently.

Partitions are differentiated by the hash values of database_name.schema_name

This mode applies to scenarios where one database corresponds to one topic, preventing data of multiple schemas from being written to the same partition, so that consumers can obtain data from different partitions concurrently.

Partitions are differentiated by the hash values of non-primary-key columns

This mode is for when one table corresponds to one topic, preventing table data from being written to the same partition. Users can customize message keys based on the hash values of non-primary-key columns, and consumers can obtain data from different partitions concurrently.

Automatically generated using the database_name-schema_name-table_name format

If each table contains a lot of data, select Automatically generated using the database_name-schema_name-table_name format.

Partition 0

If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.

Partitions are identified by the hash values of the primary key

This mode applies to scenarios where one table corresponds to one topic, preventing table data from being written to the same partition, so that consumers can obtain data from different partitions concurrently.

Partitions are differentiated by the hash values of non-primary-key columns

This mode is for when one table corresponds to one topic, preventing table data from being written to the same partition. Users can customize message keys based on the hash values of non-primary-key columns, and consumers can obtain data from different partitions concurrently.

Automatically generated based on the database name

If the source database does not contain a lot of data, select Automatically generated based on the database name.

Partitions are differentiated by the hash values of database_name.schema_name.table_name

This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved.

Partition 0

If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.

Partitions are differentiated by the hash values of database_name.schema_name

This mode applies to scenarios where one database corresponds to one topic, preventing data of multiple schemas from being written to the same partition, so that consumers can obtain data from different partitions concurrently.

Automatically generated using the database_name-schema_name format

If each schema contains a lot of data, select Automatically generated using the database_name-schema_name format.

Partitions are differentiated by the hash values of database_name.schema_name.table_name

This mode is recommended in single-table query scenarios where the read and write performance on the single table can be improved.

Partition 0

If topics are synchronized to partition 0, strong consistency can be obtained but write performance is impacted.