CDL Job DDL Changes

DDL operations include creating databases or tables, changing table field types and table field names, and adding or deleting table columns. Currently, MRS Change Data Loader (CDL) supports only DDL operations for synchronizing data from PostgreSQL to Hudi. The sequence of DDL operations is as follows:

Stop a CDL job.
Execute DDL operations on the Hudi side.
Perform DDL operations on the source database.

This section applies to MRS 3.3.0 and later versions. It provides guidance on DDL operations, such as adding fields, changing field types, changing field names, and deleting fields.

Adding a Field

Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
Log in to the node where the client is installed as the client installation user and run the following commands:

cd Client installation directory

source bigdata_env

source Hudi/component_env

kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
Run the following commands to log in to the Spark SQL CLI:

cd Spark/spark/bin

./spark-sql
Run the following command:

set hoodie.schema.evolution.enable=true;
Run the following command to add a field to the table:

alter table tableName add columns(columnName columnType);
Add the column name and data type same as those in Hudi to the source database.
Start the CDL task stopped in 1 on the CDL web UI.

Modifying a Field

During field type conversion, ensure that the data type of the source value can be correctly converted to the target type. If the data type is incompatible, the conversion may fail. As a result, the task fails.

Change the data type from VARCHAR to NUMBER.
1. Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
2. Log in to the node where the client is installed as the client installation user and run the following commands:
  cd Client installation directory
  
  source bigdata_env
  
  source Hudi/component_env
  
  kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
3. Run the following commands to log in to the Spark SQL CLI:
  cd Spark/spark/bin
  
  ./spark-sql
4. Run the following statement to change the data type from string to decimal on Hudi:
  ALTER TABLE ddltest ALTER COLUMN string TYPE decimal(20,10);
5. Insert data into the source database. The data can be properly written to Hudi.
6. In the source database, change the data type from VARCHAR to NUMBER.
7. On the CDL web UI, start a task to update data in the source database.
Change the data type from NUMBER to VARCHAR.
1. Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
2. Log in to the node where the client is installed as the client installation user and run the following commands:
  cd Client installation directory
  
  source bigdata_env
  
  source Hudi/component_env
  
  kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
3. Run the following commands to log in to the Spark SQL CLI:
  cd Spark/spark/bin
  
  ./spark-sql
4. Run the following statement to change the data type from decimal to string on Hudi:
  ALTER TABLE ddltest ALTER COLUMN decimal TYPE string;
5. Insert data into the source database. The data can be properly written to Hudi.
6. In the source database, change the data type from NUMBER to VARCHAR.
7. On the CDL web UI, start a task to update data in the source database.
Change the data type from DATE to VARCHAR.
1. Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
2. Log in to the node where the client is installed as the client installation user and run the following commands:
  cd Client installation directory
  
  source bigdata_env
  
  source Hudi/component_env
  
  kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
3. Run the following commands to log in to the Spark SQL CLI:
  cd Spark/spark/bin
  
  ./spark-sql
4. Run the following statement to change the data type from date to string on Hudi:
  ALTER TABLE ddltest2 ALTER COLUMN date TYPE string;
5. Insert data into the source database. The data can be properly written to Hudi.
6. In the source database, change the data type from DATE to VARCHAR.
7. On the CDL web UI, start a task to update data in the source database.

Change the DATA type without a time zone to the DATA type with a time zone.
1. Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
2. In the source database, change the data type from timestamp to timestamptz.
3. Insert data into the source database. The data can be properly written to Hudi.
4. On the CDL web UI, start a task to update data in the source database.
Extend strings.
1. Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
2. Extend the string length in the source database.
3. Insert data into the source database. The data can be properly written to Hudi.
4. On the CDL web UI, start a task to update data in the source database.
Increase the decimal precision.
1. Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
2. Log in to the node where the client is installed as the client installation user and run the following commands:
  cd Client installation directory
  
  source bigdata_env
  
  source Hudi/component_env
  
  kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
3. Run the following commands to log in to the Spark SQL CLI:
  cd Spark/spark/bin
  
  ./spark-sql
4. Run the following statement to increase the decimal precision on Hudi:
  ALTER TABLE ddltest2 ALTER COLUMN decimal TYPE decimal(30,15);
5. Increase the precision of the decimal type in the source database.
6. Insert data into the source database. The data can be properly written to Hudi.
7. On the CDL web UI, start a task to update data in the source database.

Changing a Field Name

Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
Log in to the node where the client is installed as the client installation user and run the following commands:

cd Client installation directory

source bigdata_env

source Hudi/component_env

kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
Run the following commands to log in to the Spark SQL CLI:

cd Spark/spark/bin

./spark-sql
Run the following statement on Hudi to change a field name:

ALTER TABLE ddltest RENAME COLUMN columnName TO newColumnName;
Change the field name in the source database.
On the CDL web UI, start a task to update data in the source database.

Deleting a Field

Log in to FusionInsight Manager and choose Cluster > Services > CDL. On the displayed page, click the hyperlink on the right of CDLService UI to go to the CDLService web UI. On the data synchronization job list page, locate the row that contains the target DDL change job, and choose More > Stop to stop a CDL job.
Log in to the node where the client is installed as the client installation user and run the following commands:

cd Client installation directory

source bigdata_env

source Hudi/component_env

kinit Component service user (Skip this step if Kerberos authentication is disabled for the cluster.)
Run the following commands to log in to the Spark SQL CLI:

cd Spark/spark/bin

./spark-sql
Run the following statement on Hudi to delete a field:

ALTER TABLE ddltest DROP COLUMN columnName;
On the CDL web UI, start a task to update data in the source database.