Updated on 2022-08-17 GMT+08:00

From Hive

If the source link of a job is the Link to Hive, configure the source job parameters based on Table 1.

Table 1 Parameter description

Parameter

Description

Example Value

Database Name

Database name. Click the icon next to the text box. The dialog box for selecting the database is displayed.

default

Table Name

Hive table name. Click the icon next to the text box. The dialog box for selecting the table is displayed.

This parameter can be configured as a macro variable of date and time and a path name can contain multiple macro variables. When the macro variable of date and time works with a scheduled job, the incremental data can be synchronized periodically.

TBL_E

Read Mode

Two read modes are available: HDFS and JDBC. By default, the HDFS mode is used. If you do not need to use the WHERE condition to filter data or add new fields on the field mapping page, select the HDFS mode.

  • The HDFS mode shows good performance, but in this mode, you cannot use the WHERE condition to filter data or add new fields on the field mapping page.
  • The HDFS mode allows you to use the WHERE condition to filter data or add new fields on the field mapping page.

HDFS

Partition Filter Criteria

This parameter is displayed when you select the HDFS read mode and click Show Advanced Attributes.

You can configure multiple values (separated by spaces) or a field value range. The time macro function is supported.

  • Single/Multi-value filtering:

    "${dateformat(yyyyMMdd, -1, DAY)} ${dateformat(yyyyMMdd)}"

  • Filter by range:

    "${value} >= ${dateformat(yyyyMMdd, -7, DAY)} && ${value} < ${dateformat(yyyyMMdd)}"

WHERE Clause

This parameter is displayed when you select the JDBC read mode and click Show Advanced Attributes.

This parameter indicates the WHERE clause to be extracted. If this parameter is not set, the entire table is extracted. If the table to be migrated does not contain the fields specified by the WHERE clause, the migration will fail.

You can set a date macro variable to extract data generated on a specific date.

age > 18 and age <= 60

If the data source is Hive, CDM will automatically partition data using the Hive data partitioning file.