Updated on 2023-06-21 GMT+08:00

Dumping Data to DWS

Source Data Type: JSON and CSV

Table 1 Dump parameters

Parameter

Description

Value

Task Name

Name of the dump task. The names of dump tasks created for the same stream must be unique. A dump task name is 1 to 64 characters long. Only letters, digits, hyphens (-), and underscores (_) are allowed.

-

DWS Cluster

Name of the DWS cluster that stores the data in the stream.

Click Select. In the Select DWS Cluster dialog box, select a DWS cluster.

You can only select but not enter a value in this field.

-

DWS Database

Name of the DWS database that stores the data in the stream.

The value must be manually entered and cannot be left blank.

-

Database Schema

A database contains one or more named schemas, and schemas contain tables. Schemas also contain other named objects, including data types, functions, and operators. The same object name can be used in different modes without causing conflicts.

-

DWS Table

Name of the DWS table that stores the data in the stream.

-

Delimiter

Delimiter used to separate the columns in the DWS tables.

This parameter cannot be left blank.

-

Offset

  • Latest: Maximum offset, indicating that the latest data will be read.
  • Earliest: Minimum offset, indicating that the earliest data will be read.

Latest

Dump Interval (s)

Interval at which data from the DIS stream will be imported into dump destination, such as OBS, MRS, DLI, and DWS. If no data was pushed to the DIS stream during the time specified here, the dump file will not be generated.

Value range: 30s to 900s

Unit: second

Default value: 300s

-

Username

Username for logging in to the DWS cluster.

-

Password

Password for logging in to the DWS cluster.

-

KMS Key

Database encryption key of the cluster

-

Temporary Bucket

OBS bucket in which a directory is created for temporarily storing user data. The data in the directory is deleted after being dumped to a specific destination.

-

Temporary Directory

Directory in the chosen Temporary Bucket for temporarily storing data. The data in the directory is deleted after being dumped to a specific destination.

If this field is left blank, the data is stored directly to the Temporary Bucket.

-

Fault Processing

You can click or to disable or enable Fault Processing.
  • fill_missing_fields

    Specifies whether to generate an error message when the last field in one row of data in the source file is missing during data import.

    Value: true/on or false/off Default value: false/off

    • true/on: The value of the last field is set to NULL and no error message is reported.
    • false/off: The following error message is reported:
  • ignore_extra_data

    Specifies whether to ignore the extra columns when the number of fields in the data source file is greater than the number of columns defined in the external table. This parameter is used only during data import.

    Value: true/on or false/off Default value: false/off

    • true/on: The extra columns at the end of a row are ignored.
    • false/off: The following error message is reported:
      
                  
      NOTE:

      If a linefeed at the end of a row is missing so that the row and another row are integrated into one, data in another row is ignored after the parameter is set to true.

  • compatible_illegal_chars

    Specifies whether to tolerate invalid characters during data import. This parameter is valid only for READ ONLY foreign tables.

    Value: true/on or false/off Default value: false/off

    • true/on: Invalid characters are converted into valid ones before being imported to the database. No error message is reported and data import is not interrupted.
    • false/off: Data import is interrupted.
      NOTICE:

      In the Windows operating system, if OBS reads data files in text format, 0x1A is used as the EOF symbol to end data reading. As a result, a parsing error occurs. This is a restriction on the Windows platform. OBS can read BINARY data files on Linux only.

    NOTE:
    • The rules for converting invalid characters are as follows:

      1. \0 is converted to a space.

      2. Others invalid characters are converted into question marks.

      3. When compatible_illegal_chars is set to true/on, after invalid characters such as NULL, DELIMITER, QUOTE, and ESCAPE are converted to spaces or question marks, an error message stating "illegal chars conversion may confuse COPY escape 0x20" will be displayed to remind you of possible parameter confusion caused by the conversion.

  • PER NODE REJECT LIMIT 'value'

    Specify the number of data format errors allowed in each DN instance during data import. If the number of errors in a DN instance is greater than the preset value, the import fails and an error message is displayed.

    Value range: integer, unlimited. The default value is 0, indicating that error information is returned immediately.
    NOTE:

    This parameter specifies the error tolerance of a single node.

    Data format errors refer to missing or redundant field values, data type errors, or coding errors. If a non-data format error occurs, the entire data import fails.

-