Updated on 2025-02-27 GMT+08:00

To Hive

If the destination link of a job is a Hive link, configure the destination job parameters based on Table 1.

Table 1 Parameter description

Parameter

Description

Example Value

Database Name

Database name. Click the icon next to the text box. The dialog box for selecting the database is displayed.

default

Table Name

Destination table name. Click the icon next to the text box. The dialog box for selecting the table is displayed.

This parameter can be configured as a macro variable of date and time and a path name can contain multiple macro variables. When the macro variable of date and time works with a scheduled job, the incremental data can be synchronized periodically. For details, see Incremental Synchronization Using the Macro Variables of Date and Time.

NOTE:

If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development jobOffset) rather than (Actual start time of the CDM jobOffset).

TBL_X

Auto Table Creation

This parameter is displayed only when the source is a relational database. The options are as follows:
  • Non-auto creation: CDM will not automatically create a table.
  • Auto creation: If the destination database does not contain the table specified by Table Name, CDM will automatically create the table. If the table specified by Table Name already exists, no table is created and data is written to the existing table.
  • Deletion before creation: CDM deletes the table specified by Table Name, and then creates the table again.
NOTE:
  • Only column comments are synchronized during automatic table creation. Table comments are not synchronized.
  • Primary keys cannot be synchronized during automatic table creation.

Non-auto creation

Source side null value conversion value

Value to which the source null value is converted

  • TO_NULL
  • TO_EMPTY_STRRING
  • TO_NULL_STRING

TO_NULL

Clear Data Before Import

Whether the data in the destination table is cleared before data import. The options are as follows:
  • Yes: The data is cleared.
  • No: The data is not cleared. Instead, it will be added to the existing table.

Yes

Processing mode of newline characters

Policy for processing the newline characters in the data written to Hive textfile tables

  • Delete
  • Replace with another string
  • Ignore

Delete

Hive Table Partition Field

This parameter is unavailable when Auto Table Creation is set to Non-auto Creation.

Partition fields for creating a Hive table. Use commas (,) to separate multiple fields.

A,B

Table Path

This parameter is unavailable when Auto Table Creation is set to Non-auto Creation.

It specifies the table path.

-

Storage Format

This parameter is unavailable when Auto Table Creation is set to Non-auto Creation.

It specifies the storage format.

  • Row-based storage format: TEXTFILE
  • Column-based storage formats: ORC, RCFILE, and PARQUET

TEXTFILE data is stored in plaintext. If data contains special characters, data may be written incorrectly. Exercise caution when using this format. The ORC format is recommended.

ORC

ClearDataMode

This parameter is available when Clear Data Before Import is set to Yes.

It specifies the mode for clearing data in the Hive table.

  • LOAD_OVERWRITE: A temporary data file directory is generated and loaded to the Hive table using the load overwrite syntax of Hive.
  • TRUNCATE: Data files in partitions are deleted, but partitions are not deleted.
    NOTE:

    If the destination is a partitioned table, you are advised to select LOAD_OVERWRITE. Otherwise, the cluster memory or disks may be overloaded.

TRUCATE

Partitions info

This parameter is available when Clear Data Before Import is set to Yes. If the destination is a partitioned table, you must specify partitions.

  • If you select the TRUCATE mode, only the data files in the partitions are deleted.
  • If you select the LOAD_OVERWRITE mode, data is written to a specified partition and overwrites the existing data.

Single partition: year=2020,location=sun

Multiple partitions: ['year=2020,location=sun', 'year=2021,location=earth']

Partitions of the previous day:

day='${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)}',

Executing Analyze Statements

After all data is written, the ANALYZE TABLE statement is asynchronously executed to accelerate query of data from Hive tables.

Run the following SQL statements:

  • Non-partitioned table: ANALYZE TABLE tablename COMPUTE STATISTICS
  • Partitioned table: ANALYZE TABLE tablename PARTITION(partcol1[=val1], partcol2[=val2], ...) COMPUTE STATISTICS
NOTE:

Parameter Executing Analyze Statements applies only to the migration of a single table.

Running the ANALYZE statements may exert pressure on Hive.

Yes

Maximum memory size of the internal write queue

If the memory is insufficient, change the value of this parameter as needed. If the value is too small, the migration speed will be affected.

The value ranges from 1 to 128 MB. The default value is empty, indicating that there is no limit. If you set a value beyond the range, there is no limit.

16

Maximum memory size of the internal conversion queue

If the memory is insufficient, change the value of this parameter as needed. If the value is too small, the migration speed will be affected.

The value ranges from 1 to 128 MB. The default value is empty, indicating that there is no limit. If you set a value beyond the range, there is no limit.

16

  • If the source Hive contains both the array and map types of data, the destination table format can only be the ORC or parquet complex type. If the destination table format is RC or TEXT, the source data will be processed and can be successfully written.
  • As the map type is an unordered data structure, the data type may change after a migration.
  • If Hive serves as the migration destination and the storage format is Textfile, delimiters must be explicitly specified in the statement for creating Hive tables. The following is an example:
    CREATE TABLE csv_tbl(
    smallint_value smallint,
    tinyint_value tinyint,
    int_value int,
    bigint_value bigint,
    float_value float,
    double_value double,
    decimal_value decimal(9, 7),
    timestmamp_value timestamp,
    date_value date,
    varchar_value varchar(100),
    string_value string,
    char_value char(20),
    boolean_value boolean,
    binary_value binary,
    varchar_null varchar(100),
    string_null string,
    char_null char(20),
    int_null int
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
    "separatorChar" = "\t",
    "quoteChar"     = "'",
    "escapeChar"    = "\\"
    )
    STORED AS TEXTFILE;