Updated on 2023-01-11 GMT+08:00

Table Input

Overview

Table Input operator converts specified columns in a relational database table into input fields of the same quantity.

Input and Output

  • Input: table columns
  • Output: fields

Parameter Description

Table 1 Operator parameters description

Parameter

Description

Type

Mandatory

Default Value

Input fields

Information about relational database input fields:

  • position: position of input fields
  • field name: input field name
  • type: field type
  • length: Field value length. If the actual field value is excessively long, the value is cut based on the configured length. When type is set to CHAR, spaces are added to the field value for supplement if the actual field value length is less than the configured length. When type is set to VARCHAR, no space is added to the field value for supplement if the actual field value length is less than the configured length.

map

Yes

None

Data Processing Rule

  • Fields are generated in a specified order. Table columns to be converted are specified by From in step 2 of job configuration. If Table column names is set, the value is the table columns to be converted; if Table column names is not set, the table columns to be converted are all table columns in the table by default or the columns specified by the query conditions set by Table SQL statement.
  • The number of input fields cannot be greater than number of specified columns; otherwise, all data becomes dirty data.
  • If the field value does not match the actual type, the data in the line will become dirty data.

Example

Use SQL Server 2014 as an example. Run the following command to create a test table:

create table test (id int, name text, value text);

Insert three data lines to the test table:

insert into test values (1,'zhangshan','zhang');

insert into test values (2,'lisi','li');

insert into test values (3,'wangwu','wang');

Query the table:

Configure the Table Input operator to generate the following fields:

After the data connector is set, click Automatic Identification. The system will automatically read fields in the database and select required fields for adding. You only need to optimize or modify the fields manually based on service scenarios.

  • This operation will overwrite existing data in the table.
  • After you click autoRecognition, manually check the field types automatically identified by the system to ensure that they are consistent with the actual ones in the table.

    For example, the system automatically identifies the date type in the Oracle database as the timestamp type. If you do not manually change the type, an error will be reported when data is queried in the Hive table.

Configure the output operator to output data to HDFS or OBS. The result is as follows: