Updated on 2022-09-15 GMT+08:00

Operator Data Processing Rules

In Loader data import and export tasks, each operator defines different processing rules for null values and empty strings in raw data. Dirty data cannot be imported or exported.

The following table describes the operator data processing rules for each conversion procedure.
Table 1 Data processing rules

Procedure

Description

CSV file input

  • If a delimiter appears twice consecutively in the original data, an empty string field is generated.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If a type conversion error occurs, the current data is saved as dirty data.

Fixed file input

  • If the original data includes null values, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If the configured field conversion type is different from the actual type of the original data, all data becomes dirty data. For example, convert the string type to the numeric type.
  • If the configured field split length is greater than the length of the original field value, the data split fails and the current line becomes dirty data.

Table input

  • If the original data includes null values, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If the configured field conversion type is different from the actual type of the original data, all data becomes dirty data. For example, convert the string type to the numeric type.

HBase input

  • If the original data includes null values, no conversion is performed.
  • If the HBase table name is incorrect, all data becomes dirty data.
  • If the primary key column is not configured in Is rowkey, all data becomes dirty data.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If the configured field conversion type is different from the actual type of the original data, all data becomes dirty data. For example, convert the string type to the numeric type.

Long integer time conversion

  • If the original data includes null values, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If a type conversion error occurs, the current data is saved as dirty data.

Null value conversion

  • If the original data contains null values, data is converted to a specified value.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.

Random value conversion

Processing of null value and empty string is not involved, and dirty data is not generated.

Constant field addition

Processing of null value and empty string is not involved, and dirty data is not generated.

Concat fields

  • If the original data contains null values, data is converted to empty string.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.

Extracts fields

  • If the original data contains null values, the current line becomes dirty data.
  • If the number of field columns after separation is greater than the actual number allowed by the original data, the line will become dirty data.

Modulo integer

  • If the original data contains null values, the current line becomes dirty data.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If data type conversion fails, the current line becomes dirty data.

String cut

  • If the input data is null, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If the start or end position of the string to be truncated is greater than the length of the input field, the current line becomes dirty data.

EL operation

  • If the input data is null, no conversion is performed.
  • Enter the value of one or more fields and output the calculation result.
  • When the input type is incompatible with the operator, the current row is dirty data.

String case conversion

  • If the input data is null, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.

String reverse

  • If the input data is null, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.

String trim

  • If the input data is null, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.

Filter rows

  • When the condition logic is AND, if no filter condition is added, all data becomes dirty data; if the original data meets all the added filter conditions, the current line becomes dirty data.
  • When the condition logic is OR, if no filter condition is added, all data becomes dirty data; if the original data meets all the added filter conditions, the current line becomes dirty data.

File output

  • If the input data is null, no conversion is performed.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If data type conversion fails, the current line becomes dirty data.

Table output

HBase output

  • If the original data contains null values and Store null column is set to true, data is converted to empty string and saved. If Store null column is set to false, data will not be saved.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If data type conversion fails, the current line becomes dirty data.

Hive output

  • If one or more columns are designated as partition columns, the Partition Handlers feature is displayed on the To page. Partition Handlers specifies the number of handlers for processing data partitioning.
  • If no column is designated as partition columns, input data does not need to be partitioned, and Partition Handlers is hidden by default.
  • It can be configured that all data becomes dirty data when the number of input field columns is greater than the number of field columns actually included in the original data.
  • If data type conversion fails, the current line becomes dirty data.