Configuring Source Information_Creating a Data Integration Task_Data Integration Guide_User Guide (ME-Abu Dhabi Region)

Overview

This topic describes how to configure source information for a data integration task. Based on the source information, ROMA Connect integrates data, including the data source type, data format, and data range. The source information configuration varies depending on data source types.

Data Source Types Supported by Scheduled Integration Tasks		Data Source Types Supported by Real-Time Integration Tasks
API DB2 DWS FTP FI HDFS FI Hive GaussDB 100 GaussDB 200 HANA Hive LDAP MySQL MongoDB	MRS Hive MRS HDFS MRS HBase OBS Oracle PostgreSQL SAP SNMP SQL Server TaurusDB Custom	ActiveMQ ArtemisMQ DIS FI Kafka HL7 IBM MQ Kafka MRS Kafka RabbitMQ WebSocket Custom

API

If Integration Mode is set to Scheduled, you can select API as the data source type at the source.

On the Create Task page, configure source information.

**Table 1** API information at the source
Parameter	Description
Instance	Select the ROMA Connect instance that is being used.
Integration Application	Select the integration application to which the API data source belongs. Ensure that the integration application has been configured in Connecting to Data Sources.
Data Source Type	Select API.
Data Source Name	Select the API data source that you configured in Connecting to Data Sources.
Paging	This parameter specifies whether data is returned on multiple pages when ROMA Connect sends a request to the API data source to obtain data. Multiple data records can be returned for one API request. If Paging is enabled, all data that meets the conditions is displayed on multiple pages based on the fixed number of records on each page. Each time an integration task is executed, ROMA Connect sends multiple API requests to obtain all data. That is, each API request is sent to obtain data on one page. If Paging is disabled, ROMA Connect obtains all data that meets the conditions through one API request.
Page Number Field	This parameter is mandatory only if Paging is enabled. Enter a page number field defined in the API data source, for example, pageNo. This parameter is carried when ROMA Connect sends an API request to the source to specify the number of the page from which data is to be obtained. Value indicates whether the page number starts from 0 or 1. Set Value based on the original definition of the API. The page number field must be configured in Params or Body of Request Parameters.
Page Size Field	This parameter is mandatory only if Paging is enabled. Enter a page size field defined in the API data source, for example, pageSize. This parameter is carried when ROMA Connect sends an API request to the source to specify the maximum number of records on each page. Set the number of records on each page based on the original definition of the API.
Maximum Number of Pages	This parameter is mandatory only if Paging is enabled. This parameter specifies the maximum number of pages that can be queried in each scheduled task, for example, 10. If the number of pages exceeds the specified value, the task is stopped. The value 0 indicates that no restriction applies.
Pagination End	This parameter is mandatory only if Paging is enabled. Select the method to stop obtaining source data in pagination mode. Empty page list: If no data record is returned, ROMA Connect stops obtaining source data. Number of records: ROMA Connect compares the calculation result between the number of requested pages and the page size with the total number of records to determine whether to stop obtaining source data.
Pagination End Field Path	This parameter is mandatory only if Paging is enabled. Enter the path of the field in an API response, which is used to determine the end of pagination. In the API response, elements in different layers are separated by periods (.). For example, if element c in the {"a":{"b":{"c":"xxx"}}} response is the pagination end field, the pagination end field path is set to a.b.c. If Pagination End is set to Empty page list, set this parameter to the root path of the list field. If Pagination End is set to Number of records, set this parameter to the path of the total number of records.
Incremental Migration	This parameter specifies whether only data generated in a specific period is integrated. For the first scheduling, the data between the initial timestamp and the current scheduling time is collected. For subsequent scheduling, the data between the last successful collection time and the current time is collected.
Start Time Field	This parameter is mandatory only if Incremental Migration is enabled. Enter the start time field originally defined in the API data source, for example, startTime. This parameter is carried when ROMA Connect sends an API request to the source, indicating that data following the specified field will be obtained. The start time field and end time field must be both entered in Params or Body of the Request Parameters.
End Time Field	This parameter is mandatory only if Incremental Migration is enabled. Enter the end time field originally defined in the API data source, for example, endTime. This parameter is carried when ROMA Connect sends an API request to the source, indicating that data before the specified value will be obtained.
Time Zone	This parameter is mandatory only if Incremental Migration is enabled. Select the time zone used by the API data source so that ROMA Connect can identify the data timestamp.
Timestamp Initial Value	This parameter is mandatory only if Incremental Migration is enabled. This parameter specifies the time at which data is to be migrated for the first time. That is, only the data generated after this time point will be migrated. Assume that Start Time Field is startTime, End Time Field is endTime, Timestamp Initial Value is 2020-11-01 12:00:00, Compensation Period is 0, and Period Settings is Default for incremental collection. If the first scheduling time of the task is 2020-11-01 13:00:00, the data collected for the first time is that the value of startTime is greater than or equal to 2020-11-01 12:00:00 and the value of endTime is less than or equal to 2020-11-01 13:00:00. For subsequent collection, the data collected each time is that the value of startTime is greater than or equal to the time when the task is successfully executed last time and the value of endTime is less than or equal to the time. Execution time of the current task.
Compensation Period (ms)	This parameter is mandatory only if Incremental Migration is enabled. This parameter specifies the period of time (in milliseconds) which will be used to compensate for any delay in data generation at the source when ROMA Connect queries incremental data. The end time for obtaining data is the current system time minus the value you specify here. For example, if the end time of the previous incremental migration task is 15:05, the current scheduled task is triggered at 17:00, and Compensation Period (ms) is set to 100, the time range of data to be integrated in the current incremental migration task is 15:05 to (17:00 - 100 ms).
Time Format	This parameter is mandatory only if Incremental Migration is enabled. Select a timestamp format, for example, yyyy-MM-dd.
Period Settings	This parameter is mandatory only if Incremental Migration is enabled. This parameter specifies the mode used for setting the time range for subsequent data integration after an incremental migration task is executed for the first time. Default: Data generated between the previous scheduling and current scheduling is integrated. When ROMA Connect obtains data from the source, it uses the triggering time of the two tasks as the start time and end time, respectively. Custom: The start time and end time are determined based on the configured period rules. This mode applies to common periodic tasks, for example, tasks executed once a day, a week, or a month.
Start Time Offset (Days)	This parameter is mandatory only if Period Settings is set to Default. Set the number of days before the start time of data collection. If data generated at the source changes in real time, such as alarm data, you can collect the data by setting this parameter. Start time of data collection = Data source system time – Start time offset
Time Interval	This parameter is mandatory only if Period Settings is set to Custom. Select the time granularity. The value must be the same as the unit configured in the task schedule so that the new data can be overwritten. For example, if Unit is set to Day in a task schedule, set this parameter to Day, indicating that data is obtained once a day.
Period	This parameter is mandatory only if Period Settings is set to Custom. Select the time period for obtaining source data. For example, if the task is executed once a day, Time Interval is set to Day, and Period is set to Previous period, data of the previous day is incrementally integrated once. If Period is set to Current period, data of the current day is incrementally integrated once.
Right Periodic Boundary	This parameter is mandatory only if Period Settings is set to Custom. This parameter specifies whether the end time is included in the time range for obtaining source data. Closed interval: The end time is included. Open interval: The end time is not included.
Request Parameters	Construct the parameter definition of the API request, for example, the page number and page size fields must be carried in Params or Body. Set this parameter based on the definition of the API data source.
Parse	If Paging is enabled, Parse is set to Yes by default and cannot be changed. This parameter specifies whether ROMA Connect parses the obtained source data. If you select Yes, ROMA Connect parses the obtained source data based on the configured parsing rules and then integrates the data to the destination. If you select No, ROMA Connect transparently transmits the obtained source data and integrates the data to the destination.
Response Type	This parameter is mandatory only if Parse is set to Yes. Select the format that will be used for the response of an API request. The value can be JSON or XML. Ensure that the format is the same as the actual response format of the API.
Data Root Field	This parameter is mandatory only if Parse is set to Yes. This parameter specifies the path of the upper-layer common fields among all metadata in the data obtained from the source in JSON or XML format. Data Root Field and Parsing Path in Metadata form a complete metadata path. For details, see Description on Metadata Parsing Path Configuration.
Metadata	This parameter is mandatory only if Parse is set to Yes. This parameter specifies each underlying key-value data element that is obtained from the source in JSON or XML format and needs to be integrated to the destination. Alias: user-defined metadata name. Type: data type of metadata. The value must be the same as the data type of the corresponding parameter in the response. Parsing Path: path of the metadata, which does not contain the data root field. For details, see Description on Metadata Parsing Path Configuration.

Description on Metadata Parsing Path Configuration

Data in JSON or XML format does not contain arrays:
For example, in the following JSON data (similar to XML data), the complete path of element a is a, the complete path of element b is a.b, the complete path of element c is a.b.c, and the complete path of element d is a.b.d. Elements c and d are underlying data elements, that is, the data to be integrated to the destination.
```
{
   "a": {
      "b": {
         "c": "xx",
         "d": "xx"
      }
   }
}
```
In this scenario, three configuration solutions are available for Data Root Field and Parsing Path:
- Data Root Field is not specified.
  Parsing Path of metadata c must be set to a.b.c, and Parsing Path of element d must be set to a.b.d.
- Data Root Field is set to a.
  Parsing Path starts from the underlying path of element a. Parsing Path of metadata c must be set to b.c, and Parsing Path of element d must be set to b.d.
- Data Root Field is set to a.b.
  Parsing Path starts from the underlying path of element b. Parsing Path of metadata c must be set to c, and Parsing Path of element d must be set to d.
Data in JSON or XML format contains arrays:
For example, in the following JSON data (similar to XML data), the complete path of element a is a, the complete path of element b is a.b, the complete path of element c is a.b[i].c, and the complete path of element d is a.b[i].d. Elements c and d are underlying data elements, that is, the data to be integrated to the destination.
```
{
   "a": {
      "b": [{
         "c": "xx",
         "d": "xx"
      },
      {
         "c": "yy",
         "d": "yy"
      }
      ]
   }
}
```
In this scenario, three configuration solutions are available for Data Root Field and Parsing Path:
- Data Root Field is not specified.
  Parsing Path of metadata c must be set to a.b[i].c, and Parsing Path of element d must be set to a.b[i].d.
- Data Root Field is set to a.
  Parsing Path starts from the underlying path of element a. Parsing Path of metadata c must be set to b[i].c, and Parsing Path of element d must be set to b[i].d.
- Data Root Field is set to a.b.
  Parsing Path starts from the underlying path of element b. Parsing Path of metadata c must be set to [i].c, and Parsing Path of element d must be set to [i].d.

The preceding data in JSON or XML format that contains arrays is used as an example. The following describes the configuration when the destination is API:

In the example of pagination configuration. pageNo and pageSize are the pagination parameters of the API and need to be added to the Request Parameters.
Figure 1 API pagination configuration example
In the example of incremental migration configuration, startTime and endTime are the time parameters of the API and need to be added to the Request Parameters.
Figure 2 API incremental migration configuration example
In the example of metadata configuration, Data Root Field is set to a.
Figure 3 API metadata configuration example

After configuring the source information, proceed with Configuring Destination Information.

Configuring Source Information

Overview

API

ActiveMQ

ArtemisMQ

DB2

DWS

DIS

FTP

FI HDFS

FI Hive

FI Kafka

GaussDB 100

GaussDB 200

HANA

HL7

Hive

IBM MQ

Kafka

LDAP

MySQL

MongoDB

MRS Hive

MRS HDFS

MRS HBase

MRS Kafka

OBS

Oracle

PostgreSQL

RabbitMQ

SAP

SNMP

SQL Server

TaurusDB

WebSocket

Custom

Feedback

Was this page helpful?