Creating a Composite Task

Overview

You can create a composite task if you need to continuously synchronize real-time data. This task allows FDI to implement real-time and incremental synchronization of multiple data tables from the source to destination, improving the data integration and synchronization efficiency.

The composite task supports flexible mappings of fields between data tables. For example, multiple fields in one data table at the source can be mapped to different data tables at the destination, or fields in multiple data tables at the source can be mapped to one data table at the destination.

Prerequisites

You have connected to data sources at the source and destination. For details, see data source management.
In the data source configuration at the source, the value of Database must be the same as the actual database name (case-sensitive). Otherwise, data synchronization will fail.
The CDC function has been enabled at the source. The CDC implementation modes vary depending on data source types. For details, see the following:
The retention period of CDC archiving logs in a data source at the source must be greater than the log time parsed by the integration task. Otherwise, the integration task cannot find archive logs, resulting in incremental synchronization failures. Therefore, it is not recommended that a data integration task be stopped for a long time. It is recommended that archive logs be retained for at least two days.
Do not perform Data Definition Language (DDL) operations on the source database during the first data synchronization.
If a large number of composite tasks are created, the database server and FDI plug-in process will consume resources. Therefore, you are advised not to create too many composite tasks for a database.
You can configure multiple database tables under multiple schemas in a single CDC task to implement unified collection for full or incremental data.
During the running of a composite task, you can add a table and perform full or incremental collection on the new table after the restart.
Synchronization is not supported for the following types of data sources at the source:
- Fields of the large text type and binary type
- A data table whose name contains lowercase letters cannot be synchronized.
- Data tables that do not have primary keys cannot be synchronized.
  If a table contains a small amount of data, you are advised to collect full data once a day. Currently, data in a PostgreSQL table can be cleared before being written to the table. If data is collected from the Oracle database but no primary key is available in the table, you can use the internal RowId of the Oracle database as the primary key. The RowId is a string of 18 characters generated using digits and letters.
- Data tables or data fields whose names are reserved in the database
- Data deleted in truncate mode cannot be synchronized. Data deleted in entire table mode cannot be synchronized.
For the MySQL data source at the source:
If the MySQL database uses the MGR cluster mode, the source data source must be directly connected to the active node instead of the route node.

If the MySQL database contains a large amount of data, the connection to the database may time out when data is synchronized for the first time. You can modify the interactive_timeout and wait_timeout parameters of the MySQL database to avoid this problem.

Procedure

Log in to the ROMA Connect console. On the Instances page, click View Console next to a specific instance.
In the navigation pane on the left, choose Fast Data Integration > Task Management. On the page displayed, click Create Composite Task.

On the Create Composite Task page, configure basic task information.

**Table 1** Basic task information
Parameter	Description
Task Name	After a task is created, the task name cannot be modified. It is recommended that you enter a name based on naming rules to facilitate search.
Description	It is recommended that you add task descriptions based on the actual task usage to differentiate tasks. The task description can be modified after being created.
Tag	Add a tag to classify tasks for quick search. You can select an existing tag for association. If no tag is available, click Add Tag to add a new tag. The new tag is saved when the task is saved. You can search for the tag when creating a task.
Operation Types	Select the operation types for database logs, including insert, delete, and update. For example, if you select Insert and Update, only the logs related to data insert and update in the database are obtained.

Configure a mapping between data sources at the source and destination.

**Table 2** Source and destination configuration information
Parameter		Description
Source	Source Instance	Select the ROMA Connect instance that is being used.
	Integration Application	Select the integration application to which the data source at the source belongs.
	Data Source Type	Select a data source type at the source. The source data source type can only be MySQL, TaurusDB, Oracle, or SQL Server.
	Data Source	Select a data source at the source. The data source must have been created in advance.
	Server ID	This parameter is mandatory only if MySQL is selected as the data source type. The value must be an integer greater than 1 and must be different from the server-id value set in Configuring MySQL CDC and in other composite tasks.
Destination	Destination Instance	Select the ROMA Connect instance that is being used. After the source instance is configured, the destination instance is automatically associated and does not need to be configured.
	Integration Application	Select the integration application to which the data source at the destination belongs.
	Data Source Type	Select a data source type at the destination. The destination data source type can only be HANA, Kafka, MySQL, TaurusDB, Oracle, PostgreSQL, or SQL Server.
	Data Source	Select a data source at the destination. The data source must have been created in advance.
	Topic Name	This parameter is mandatory only if Kafka is selected as the data source type. Select the topic whose data is to be integrated to the destination Kafka. The message data is stored in the topic.

Configure data table mappings between the source and destination in manual or automatic mode.
- The length of a data field at the destination must be greater than or equal to that of the data field at the source. Otherwise, the synchronized data will be lost.
- A maximum of 1000 data tables can be synchronized in a task.
- If the data source type at the destination is Kafka, the table displayed on the destination is a virtual table. You only need to edit the field mappings in the table.
- Automatic mapping
  1. Click Automatic mapping. The mappings between data tables are automatically generated.
  2. Click Edit to modify a mapping between data tables as required.
  3. Click Map. In the dialog box displayed, you can modify the mappings between fields in the data tables as required or add new mappings.
    The length of a data field at the destination must be greater than or equal to that of the data field at the source. Otherwise, the synchronized data will be lost.
- Manually adding a mapping
  1. Click Add to manually add a mapping between data tables.
    Figure 1 Manually adding a mapping
  2. Select data table names for Destination Table Name and Source Table Name.
    If there are a large number of data tables in the database, you can add filter criteria to filter required data tables at the source and destination.
    
    Click the filter criteria text box, select Destination data table or Source data table, enter the data table name, and click .
    - For the Oracle database, enter a filter criterion in the format of Schema name.Data table name. For other relational databases, enter a filter criterion in the format of Database name.Data table name.
    - % indicates any character string. For example, roma% indicates all data tables whose names start with roma.
    - The entered filter criteria are case sensitive.
    - You can add a filter criterion for both the destination and source tables.
  1. Click Map. The dialog box for configuring the field mappings is displayed. You can click Edit to modify existing fields and field mappings based on the site requirements.
    You can also click Add Mapping to add the fields to be synchronized and the field mappings. The mapping configuration items are described as follows:
    - Destination Field: Select the corresponding field name in the destination table, for example, ID.
    - Source Field/Constant: Select the field name or constant in the source table, for example, CODE.
    - Prefix: Enter the prefix of the synchronization field.
    - Suffix: Enter the suffix of the synchronization field.
    The following is an example of configuring the prefix and suffix. For example, if the field content is test, the prefix is tab1, and the suffix is 1, the field after synchronization is tab1test1.
    
    Figure 2 Configuring field mappings

Configure abnormal data storage.

This part is available only when the data source type at the destination is MySQL, TaurusDB, Oracle, PostgreSQL, or SQL Server. Before configuring the abnormal data storage, you need to connect to the OBS data source. For details, see Connecting to an OBS Data Source.

During each task execution, if some data at the source meets integration conditions but cannot be integrated to the destination due to network jitter or other exceptions, ROMA Connect stores the data to the OBS bucket as text files.

**Table 3** Abnormal data storage information
Parameter	Description
Source Data Type	This parameter can only be set to OBS.
Integration Application	Select the required integration application.
Data Source Name	Select the OBS data source that you configured.
Path	Enter the object name of the OBS data source where abnormal data is to be stored. The value of Path cannot end with a slash (/).

Click Save.
If any of the following cases occurs after a composite task is started, you can choose More > Reset Task in the Operation column in the task list and select date and time for task reset based on site requirements. After the task is reset, it will synchronize data again, and then checks and synchronizes incremental data in real time.
- Composite tasks need to support synchronization of new data tables and data fields at the source.
- The CDC archive logs at the source are cleared. As a result, the composite task fails to be synchronized.
- The MySQL database does not use the GTID mode, and an active/standby switchover occurs. As a result, the composite task fails to be synchronized.
You can reset the task only when Task Status is Stopped.