Synchronizing Open-source Debezium JSON Data

Debezium is an open-source distributed platform for change data capture. It records row-level changes of each table in the form of event streams. Debezium is built on top of Kafka and provides a group of connectors compatible with Kafka Connect. Each connector is used to capture change events of a specific database and send event streams to Kafka topics. CDL can process JSON-format create (c), change (u), and delete (d) event messages captured by the Debezium connectors of MySQL, PostgreSQL, and Oracle databases of version 1.4.0.

Database Data Types and Spark (Hudi) Data Types

To write data to Hudi by consuming change event messages in Debezium JSON format of the database, see 2.6.3.12-Synchronizing Debezium JSON Data from ThirdKafka to Hudi. The supported database data types, and the mapping between them and Spark data types are listed in the following table.

**Table 1** Mapping between PostgreSQL and Spark (Hudi) data types
PostgreSQL	Spark (Hudi)
int2	int
int4	int
int8	bigint
numeric[p, s]	decimal[p,s]: decimal.handing.mode of the Debezium connector is precise (default value). string: decimal.handing.mode of the Debezium connector is string. double: decimal.handing.mode of the Debezium connector is double.
bool	boolean
char	string
varchar	string
text	string
timestamptz	timestamp
timestamp	timestamp
date	date
json, jsonb	string
float4	float
float8	double

**Table 2** Mapping between MySQL and Spark (Hudi) data types
MySQL	Spark (Hudi)
int	int
integer	int
bigint	bigint
double	double
decimal[p,s]	decimal[p,s]: decimal.handing.mode of the Debezium connector is precise (default value). string: decimal.handing.mode of the Debezium connector is string. double: decimal.handing.mode of the Debezium connector is double.
varchar	string
char	string
text	string
timestamp	timestamp
datetime	timestamp
date	date
json	string
float	double

**Table 3** Mapping between Oracle and Spark (Hudi) data types
Oracle	Spark (Hudi)
NUMBER(1,0)	boolean
NUMBER(P, 0) P->[2, 9]	int
NUMBER(P, 0) P->[10, 18]	bigint
NUMBER(P, 0) P >= 19 NUMBER(P, S > 0) NUMBER[(P)]	decimal: decimal.handing.mode of the Debezium connector is precise (default value). string: decimal.handing.mode of the Debezium connector is string. double: decimal.handing.mode of the Debezium connector is double.
FLOAT	decimal
BINARY_DOUBLE	double
CHAR	string
VARCHAR	string
TIMESTAMP	timestamp
timestamp with time zone	timestamp
DATE	timestamp