Updated on 2024-10-23 GMT+08:00

From Hive

Data can be exported from Hive through the JDBC API.

If the data source is Hive, CDM will automatically partition data using the Hive data partitioning file.

Table 1 Parameter description

Type

Parameter

Description

Example Value

Basic parameters

readMode

Two read modes are available: HDFS and JDBC. By default, the HDFS mode is used. If you do not need to use the WHERE condition to filter data or add new fields on the field mapping page, select the HDFS mode.

  • The HDFS mode shows good performance, but in this mode, you cannot use the WHERE condition to filter data or add new fields on the field mapping page.
  • The JDBC mode allows you to use the WHERE condition to filter data or add new fields on the field mapping page.

HDFS

Database

Database name. Click the icon next to the text box. The dialog box for selecting the database is displayed.

default

Table Name

Hive table name. Click the icon next to the text box. The dialog box for selecting the table is displayed.

This parameter can be configured as a macro variable of date and time and a path name can contain multiple macro variables. When the macro variable of date and time works with a scheduled job, the incremental data can be synchronized periodically. For details, see Incremental Synchronization Using the Macro Variables of Date and Time.

NOTE:

If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development jobOffset) rather than (Actual start time of the CDM jobOffset).

TBL_E

Use SQL Statement

This parameter is displayed when readMode is set to JDBC.

Whether you can use SQL statements to export data from a relational database

No

SQL Statement

This parameter is displayed when Use SQL Statement is set to Yes. CDM exports data based on the SQL statement you enter.

NOTE:
  • SQL statements can only be used to query data. Join and nesting are supported, but multiple query statements are not allowed, for example, select * from table a; select * from table b.
  • With statements are not supported.
  • Comments, such as -- and /*, are not supported.
  • Addition, deletion, and modification operations are not supported, including but not limited to the following:
    • load data
    • delete from
    • alter table
    • create table
    • drop table
    • into outfile

select id,name from sqoop.user;

Transmission Mode

The value can be Record migration (default) or File migration. File migration is supported only when the source is Hive 2.x with data stored in HDFS and the destination is Hive 3.x with data stored in OBS.

If you select File, ensure that the table format and attributes of the source and destination are the same.

  • Record migration
  • File migration

Partition Values

This parameter is displayed when readMode is set to HDFS.

This parameter indicates extracting the partition of a specified value. The attribute name is the partition name. You can configure multiple values (separated by spaces) or a field value range. The time macro function is supported. For details, see Incremental Synchronization Using the Macro Variables of Date and Time.

NOTE:

If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development jobOffset) rather than (Actual start time of the CDM jobOffset).

  • Attribute value in the single-value or multi-value filtering scenario:

    ${dateformat(yyyyMMdd, -1, DAY)} ${dateformat(yyyyMMdd)}

  • Attribute value in the range filtering scenario:

    ${value} >= ${dateformat(yyyyMMdd, -7, DAY)} && ${value} < ${dateformat(yyyyMMdd)}

Advanced attributes

Where Clause

This parameter is displayed when readMode is set to JDBC and Use SQL Statement is set to No.

This parameter indicates the where clause used to extract data. If this parameter is not set, data of the entire table will be extracted. If the table to be migrated does not contain the fields specified by the where clause, the migration will fail.

You can set a date macro variable to extract data generated on a specific date. For details, see Incremental Migration of Relational Databases.

NOTE:

If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development jobOffset) rather than (Actual start time of the CDM jobOffset).

age > 18 and age <= 60