LOAD DATA (for gs_loader)

Description

Control file syntax of gs_loader. For details about how to use gs_loader, see "Client Tools > gs_loader" in Tool Reference.

Syntax

LOAD [ DATA ]
[CHARACTERSET char_set_name]
[INFILE [directory_path] [filename ] ]
[BADFILE [directory_path] [filename ] ]
[OPTIONS(name=value)]
[{ INSERT | APPEND | REPLACE | TRUNCATE }] 
INTO TABLE table_name
[{ INSERT | APPEND | REPLACE | TRUNCATE }] 
[FIELDS CSV]
[TERMINATED [BY] { 'string' }]
[OPTIONALLY ENCLOSED BY { 'string' }]
[TRAILING NULLCOLS]
[ WHEN { (start:end) | column_name } {= | !=} 'string' ]
[(
col_name [ [ POSITION ({ start:end }) ]  ["sql_string"] ] | [ FILLER [column_type [external] ] ] | [ CONSTANT "string" ] | [ SEQUENCE ( { COUNT | MAX | integer } [, incr] ) ]|[NULLIF (COL=BLANKS)]
[, ...]
)]

Parameters

CHARACTERSET
Specifies a character set.

Value range: a string. Currently, the value can be 'AL32UTF8', 'zhs16gbk', or 'zhs32gb18030'.

Note: The character set specified by CHARACTERSET in the control file must be the same as the encoding format of the file. Otherwise, an error is reported or garbled characters are displayed in the imported data.
INFILE
The current keyword is invalid and needs to occupy a separate line in the control file. The keyword is ignored during running. You need to specify the corresponding data file in the gs_loader command line parameters.
BADFILE
The current keyword is invalid and will be ignored during running. If no .bad file is specified in the gs_loader command, a .bad file will be generated based on the name of the corresponding control file.
OPTIONS
Only the skip and rows parameters take effect. skip=n indicates that the first n records are skipped during import, and rows=n indicates the number of rows to be imported before a commit. If both the command line and control file are specified, the command line has a higher priority.
INSERT | APPEND | REPLACE | TRUNCATE
Specifies the import mode.

INSERT: If the table contains data, an error is reported.

APPEND: Data is inserted directly.

REPLACE: If the table contains data, all data is deleted and then inserted.

TRUNCATE: If the table contains data, all data is deleted and then inserted.

When writing a control file (.ctl), you can specify the import mode (INSERT | APPEND | REPLACE | TRUNCATE) before and after the INTO TABLE table_name statement. The priority is as follows: The import mode specified after the statement takes precedence over and overwrites that specified before the statement.
FIELDS CSV
Specifies that the CSV mode of COPY is used. In CSV mode, the default separator is a comma (,), and the default quotation mark is a double quotation mark (").

In the current CSV mode, quoted line feeds are considered as part of the column data.
table_name
Specifies the name (possibly schema-qualified) of an existing table.

Value range: an existing table name
TERMINATED [BY] { 'string' }
Specifies the character string that separates columns in a file, which contains a maximum of 10 bytes.

Value range: The value cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789 The nul character cannot be set as a separator.

Value range: The default value is a tab character in text format or a comma in CSV format.

After enabling nul character compatibility (compatible_nul set to true), if the specified separator is a space character (0x20), note that the separator is the space character that exists in the data file instead of the space character converted from the nul character.
OPTIONALLY ENCLOSED BY { 'string' }
Specifies a quoted character string for a CSV file.

The default value is double quotation marks (") only in CSV mode that is explicitly specified by the FIELDS CSV parameter.

In other modes, there is no default value.
- When you set OPTIONALLY ENCLOSED BY { 'string' }, there should be no quotation mark on the left of the data; otherwise, the number of quotation marks on either left or right must be an odd number but they do not have to be equal.
- Currently, OPTIONALLY ENCLOSED BY { 'string' } is supported only in CSV mode. If OPTIONALLY ENCLOSED BY { 'string' } is specified, the system enters the CSV mode by default.
TRAILING NULLCOLS
Specifies how to handle the problem that multiple columns of a row in a source data file are lost during data import.

If one or more columns at the end of a row are null, the columns are imported to the table as null values. If this parameter is not set, an error message is displayed, indicating that the error column is empty. In this case, the data in this row is processed as an error.
WHEN { (start:end) | column_name } {= | !=}
Filters rows by character string between start and end or by column name.

Value range: a string.
- When the GUC parameter enable_copy_when_filler is set to on (default value), data can be filtered based on the FILLER column. When the GUC parameter enable_copy_when_filler is set to off, this usage is not supported.
- The WHEN condition cannot be followed by special characters such as '\0' and '\r'.
POSITION ({ start:end })
Processes columns and obtain the corresponding character strings between start and end.
"sql_string"
Processes columns and calculates column values based on column expressions. For details, see "Column Expressions" in "Client Tools > gs_loader" in Tool Reference.

Value range: a string.
FILLER
Processes columns. If FILLER occurs, this column is skipped.
column_type [external]
Processes the imported data according to different data types. For details, see "Data Type" in "Client Tools > gs_loader" in Tool Reference.
CONSTANT
Processes columns and sets the inserted columns to constants.

Value range: a string.
SEQUENCE ( { COUNT | MAX | integer } [, incr] )
Processes columns to generate the corresponding sequence values.
- COUNT: The count starts based on the number of rows in the table.
- MAX: The count starts from the maximum value of this column in the table.
- integer: The count starts from the specified value.
- incr: indicates the increment each time.
NULLIF
Processes columns. In multi-row import scenarios, if sysdate, constant, position, or column expression is not specified after a column name, the column whose NULLIF keyword is not specified is left empty.

Currently, only the COL POSITION() CHAR NULLIF (COL=BLANKS) syntax is supported. For details, see "NULLIF Usage Cases" in "Client Tools > gs_loader" in Tool Reference.