Help Center/ GaussDB/ Tool Guide(Centralized_V2.0-3.x)/

Updated on 2025-08-21 GMT+08:00

View PDF

gs_loader

Overview

gs_loader is used to import data. gs_loader converts the syntax supported by the control file to the \COPY syntax, uses the existing \COPY function to import data, and records the \COPY result in logs.

Before using gs_loader, ensure that the gs_loader version is consistent with the gsql version and database version.

Installation and Deployment

Install and configure the gs_loader client tool on the server where source data files are stored so that you can use the gs_loader tool to import data.

Create a directory for storing the gs_loader tool package.
```
mkdir -p /opt/bin
```
Upload the gsql package to the created directory.

Upload the gsql tool package GaussDB-Kernel_Database version number_OS version number_64bit_gsql.tar.gz (the EulerOS tool package is used as an example) in the software installation package to the directory created in the previous step.

Go to the new directory and decompress the package.

cd /opt/bin
tar -zxvf GaussDB-Kernel_Database version number_OS version number_64bit_gsql.tar.gz
source gsql_env.sh

Verify the tool location and version information.
```
which gs_loader
```
Verify the client version information.
The gs_loader tool version corresponds to the gsql tool version. You can directly query the gsql client version.
```
gsql -V
```
Verify that the database version is the same as the client tool version.
Use gsql to connect to the database and run the following command:
```
select version();
```

Log Level Configuration

Set the log level for developers to view. After the setting, the tool running information is printed on the console.

export gs_loader_log_level=debug
export gs_loader_log_level=info
export gs_loader_log_level=warning
export gs_loader_log_level=error

Permission

The application scenarios are classified into separation-of-duties and non-separation-of-duties scenarios. You can set enableSeparationOfDuty to on or off to enable or disable the separation of duties function.

The enable_copy_error_log GUC parameter specifies whether to use the error table pgxc_copy_error_log. The default value is off, indicating that the error table is not used and error records are directly recorded in the .bad file of gs_loader. If this parameter is set to on, the error table pgxc_copy_error_log is used and error records are written to both the .bad file and error table.

By default, if enableSeparationOfDuty is set to off, the user can be a common database user or an administrator. If the user is a common user, the administrator needs to grant permissions to the common user. The administrator account can be used directly.
1. Create a user (as an administrator).
```
CREATE USER load_user WITH PASSWORD '************';
```
2. Grant the public schema permission to the user (as an administrator).
```
GRANT ALL ON SCHEMA public TO load_user;
```
3. Create the gs_copy_summary table and grant table permissions to the user (as an administrator).
  
  The gs_copy_summary table cannot contain objects that may cause privilege escalation, such as RULE, TRIGGER, index functions, row-level security, CHECK constraints, GENERATED columns, DEFAULT columns, and ON UPDATE columns; otherwise, the system considers that the user is created by a malicious user, reports an error, and exits.
```
SELECT copy_summary_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='gs_copy_summary');
GRANT INSERT,SELECT ON  public.gs_copy_summary To load_user;
```
4. (Optional) Create the pgxc_copy_error_log table and grant table permissions to the user (as an administrator).
  - If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
  - The pgxc_copy_error_log table cannot contain objects that may cause privilege escalation, such as RULE, TRIGGER, index functions, row-level security, CHECK constraints, GENERATED columns, DEFAULT columns, and ON UPDATE columns; otherwise, the system considers that the user is created by a malicious user, reports an error, and exits.
```
SELECT copy_error_log_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='pgxc_copy_error_log');
GRANT INSERT,SELECT,DELETE ON  public.pgxc_copy_error_log To load_user;
```

If enableSeparationOfDuty is set to on, the user can be a common database user or an administrator. Create the pgxc_copy_error_log and gs_copy_summary tables in their respective schemas and add indexes. No permission granting is required.
1. Create a user (as the initial user).
```
CREATE USER load_user WITH PASSWORD '********';
```
2. Switch to the load_user user (as the initial user).
```
\c - load_user
```
3. Create the gs_copy_summary table and add an index (as the created user).
```
CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text);
CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
```
4. (Optional) Create the pgxc_copy_error_log table and add an index (as the created user).
  
  If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
```
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text);
CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname);
```

Usage Environment

You need to add the tool path to PATH. gs_loader supports SSL encrypted communication. The method of using gs_loader is the same as that of using gsql.

Adding System Catalogs

The gs_copy_summary table is added to record the COPY execution result summary, including the number of successful rows, number of error rows, number of ignored rows, and number of empty rows.

The copy_summary_create function is added to create the gs_copy_summary table.

The format of the gs_copy_summary table is as follows:

relname     | public.sqlldr_tbl
begintime   | 2021-09-03 16:00:11.7129-04
endtime     | 2021-09-03 16:00:15.259908-04
id          | 21870
pid         | 47582725060352
readrows    | 100000
skiprows    | 0
loadrows    | 111
errorrows   | 0
whenrows    | 99889
allnullrows | 0
detail      | 111 Rows successfully loaded.
            | 0 Rows not loaded due to data errors.
            | 99889 Rows not loaded because all WHEN clauses were failed.
            | 0 Rows not loaded because all fields were null.
            |

Columns in the gs_copy_summary System Catalog

**Table 1** gs_copy_summary columns
Column	Description
relname	Name of the target table to be imported.
begintime	Start time of an import task.
endtime	End time of an import task.
id	ID of the transaction to be imported.
pid	ID of the worker thread for the current import.
readrows	Total number of data rows read by the import task.
skiprows	Total number of data rows skipped in the import task.
loadrows	Number of data rows successfully imported in the current import task.
errorrows	Number of error data rows in the current import task.
whenrows	Number of data rows that violate the WHEN filter criterion in the current import task.
allnullrows	Number of data rows where all columns are empty.
detail	Summary of the import task, including the number of successfully imported rows, number of error data rows, number of rows that violate the WHEN condition, and number of blank rows.

Usage Guidelines

(If the separation of duties function is disabled) For common users only:
1. Create a user (as an administrator).
```
CREATE USER load_user WITH PASSWORD '************';
```
2. Grant the public schema permission to the user (as an administrator).
```
GRANT ALL ON SCHEMA public TO load_user;
```
3. Create the gs_copy_summary table and grant table permissions to the user (as an administrator).
```
SELECT copy_summary_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='gs_copy_summary');
GRANT INSERT,SELECT PRIVILEGES ON  public.gs_copy_summary To load_user;
```
4. (Optional) Create the pgxc_copy_error_log table and grant table permissions to the user (as an administrator).
  
  If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
```
SELECT copy_error_log_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='pgxc_copy_error_log');
GRANT INSERT,SELECT,DELETE ON  public.pgxc_copy_error_log To load_user;
```
5. Switch to another user (as an administrator).
```
\c - load_user
```

(If the separation of duties function is enabled) For common users and administrators:

Create a user (as the initial user).

CREATE USER load_user WITH PASSWORD '************';

Switch to the load_user user (as the initial user).
```
\c - load_user
```

Create the gs_copy_summary table and add an index (as the created user).

CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text);
CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);

(Optional) Create the pgxc_copy_error_log table and add an index (as the created user).

If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
```
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text);
CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname);
```

Create a table and a control file, and prepare a data file.

Create the loader_tbl table.

      
           CREATE TABLE  loader_tbl
(
    ID   NUMBER,
    NAME VARCHAR2(20),
    CON  VARCHAR2(20),
    DT   DATE
);

(On the gs_loader client) Create the control file loader.ctl.

      
           LOAD DATA
truncate into table loader_tbl
WHEN (2:2) = ',' 
fields terminated by ','
trailing nullcols
(
    id integer external,
    name char(32),
    con ":id || '-' || :name",
    dt date
)

(On the gs_loader client) Create the GUC parameter file guc.txt.

set a_format_copy_version='s1';

(On the gs_loader client) Create the data file data.csv.

      
           1,OK,,2007-07-8
2,OK,,2008-07-8
3,OK,,2009-07-8
4,OK,,2007-07-8
43,DISCARD,,2007-07-8
,,,
32,DISCARD,,2007-07-8
a,ERROR int,,2007-07-8
8,ERROR date,,2007-37-8
,,,,
 ,
8,ERROR fields,,2007-37-8
,,,
5,OK,,2021-07-30

Import the data.

Before importing data, ensure that the gs_loader tool has the execute permission. Ensure that the current path has the write permission on files. (The gs_loader generates some temporary files during the processing and automatically deletes them after the import is complete.)

      
           gs_loader control=loader.ctl data=data.csv db=testdb bad=loader.bad errors=5 port=8000 passwd=************ user=load_user

Execution result:

      
           gs_loader: version 0.1

 5 Rows successfully loaded. 

log file is: 
 loader.log

gs_copy_summary records the called COPY syntax and details. The [badfile]_bad.log file records error data and details. To prevent the error data and details recorded during the last import from being overwritten, you are advised to use different bad parameters for each import. The logging function using the pgxc_copy_error_log table is disabled by default. To use the error table pgxc_copy_error_log to record error data and details, enable the GUC parameter enable_copy_error_log. To delete data from a table, perform the TRUNCATE or DELETE operation on the table.

Parameters

**Table 2** gs_loader parameter description
Parameter	Description	Parameter Type: Value Range
help	Displays help information.	-
user	Database connection user (equivalent to -U).	String
-U	Database connection user (equivalent to user).	String
passwd	User password (equivalent to -W).	String
-W	User password (equivalent to passwd).	String
db	Database name. This parameter is required and is equivalent to -d.	String
-d	Database name. This parameter is required and is equivalent to db.	String
host	Specifies the host name of the running server, the UDS path, or the domain name. You can specify multiple host addresses by using character strings separated by commas (,). This parameter is equivalent to -h. If multiple host addresses are specified, the primary node is connected by default.	See the gsql --host parameter.
-h	Specifies the host name of the running server, the UDS path, or the domain name. You can specify multiple host addresses by using character strings separated by commas (,). This parameter is equivalent to host. If multiple host addresses are specified, the primary node is connected by default.	See the gsql --host parameter.
port	Specifies the port number of the database server. One or more port numbers can be configured. When one port number is configured, all IP addresses use the same port for connection. When multiple port numbers are configured, the sequence is the same as the IP address sequence, and the number of port numbers must be the same as the number of IP addresses. If they are different, an error is reported. This parameter is equivalent to -p.	See the gsql --port parameter.
-p	Specifies the port number of the database server. One or more port numbers can be configured. When one port number is configured, all IP addresses use the same port for connection. When multiple port numbers are configured, the sequence is the same as the IP address sequence, and the number of port numbers must be the same as the number of IP addresses. If they are different, an error is reported. This parameter is equivalent to port.	See the gsql --port parameter.
create	Determines whether to create the pgxc_copy_error_log and gs_copy_summary tables.	The value can be true or false. The default value is true.
clean	Specifies whether to clear the error record.	The value can be true or false. The default value is false.
data	(Required) Data file. You can specify multiple data files or use wildcards (*) and question marks (?) to represent multiple data files.	String
control	(Required) Name of a control file.	String
log	Name of a log file.	String
bad	Name of the file that records the error lines and details. You can also specify a directory. If you do not specify a directory, the file is generated based on the data file name.	String
discard	Name of the file recording the lines that fail to be matched by WHEN. You can also specify a directory to generate the file name based on the data file name.	String
errors	Maximum number of error lines in a data file.	Integer Default value: 0
skip	Number of first lines that can be skipped in a data file.	Integer Default value: 0
bindsize	Only syntax compatibility is implemented, but functions are not implemented.	-
rows	Specifies the number of rows of data to be imported before a commit.	The value is an integer in the range [1,2147483647].

All parameters are in lowercase and are compatible with the gsql login mode, including -p port number, -h host, -d database, -U username, and -W password.
When the rows parameter is specified, the number of commit times cannot exceed 1000. Otherwise, the performance is affected. The number of commit times is approximately equal to the number of data rows in the data file divided by the value of rows. If the rows parameter is not specified, there is no default value for rows. In this case, the transaction is committed only once after all data is imported to the table. This parameter can be used together with the skip parameter to skip the imported data.
When gs_loader sets the GUC parameter a_format_load_with_constraints_violation to support non-rollback upon constraint conflicts, if a table has a BEFORE/AFTER ROW INSERT trigger, a maximum of 10,000,000 rows can be committed at a time.
gs_loader does not support statement-level triggers when the GUC parameter a_format_load_with_constraints_violation is set to support non-rollback upon constraint conflicts.
During the import using gs_loader, if transcoding is not required, the size of a single row of data (including tuple metadata) is less than 1 GB minus 1 byte; if transcoding is required, the size of a single row of data is less than 256 MB minus 1 byte. In the following transcoding scenarios, special processing is required: The size for transcoding from UTF-8 to GB18030/GB18030_2022 must be less than 512 MB minus 1 byte, and that for transcoding from UTF-8 to GBK must be less than 1 GB minus 1 byte. If the data volume in a single row is too large and the value of max_process_memory is too small, an error message is displayed, indicating that the memory is insufficient. In this case, you need to adjust the value of max_process_memory and try again.
It is recommended that the size of a single file to be imported be less than or equal to 1 GB. gs_loader has no limit on the size of a single file to be imported. However, importing a large file is time-consuming. Therefore, you are advised to split a large data file, start multiple gs_loader processes to append data to the table. (If the truncate operation is required, run the truncate command separately instead of writing the truncate operation to the control file.) When the CPU resources are sufficient, this method can effectively improve the import speed.
If a field in a data file contains a newline character (usually in the CSV mode of gs_loader), you are advised not to split the file in split mode. Otherwise, a logical record may be split into two physical records, causing an import error.
In a basic data migration scenario, you are advised to delete the index on the table and disable the trigger on the table. After the data migration is complete, rebuild the index and restore the trigger on the table. This is helpful to improve the import performance.

Control Files

Syntax

      
       
         
         
           LOAD [ DATA ]
[CHARACTERSET char_set_name]
[INFILE [directory_path] [filename ] ]
[BADFILE [directory_path] [filename ] ]
[OPTIONS(name=value)]
[{ INSERT | APPEND | REPLACE | TRUNCATE }] 
INTO TABLE table_name
[{ INSERT | APPEND | REPLACE | TRUNCATE }] 
[FIELDS CSV]
[TERMINATED [BY] { 'string' }]
[OPTIONALLY ENCLOSED BY { 'string' }]
[TRAILING NULLCOLS]
[ WHEN { (start:end) | column_name } {= | !=} 'string' ]
[(
col_name [ [ POSITION ({ start:end }) ]  ["sql_string"] ] | [ FILLER [column_type [external] ] ] | [ CONSTANT "string" ] | [ SEQUENCE ( { COUNT | MAX | integer } [, incr] ) ]|[NULLIF (COL=BLANKS)]
[, ...]
)]

          

        

      
     

Parameters:
- CHARACTERSET
  Character set.
  
  Value range: a string.
  
  The character set specified by CHARACTERSET in the control file must be the same as the encoding format of the file. Otherwise, an error is reported or garbled characters are displayed in the imported data.
- INFILE
  The current keyword is invalid and needs to occupy a separate line in the control file. The keyword is ignored during running. You need to specify the corresponding data file in the gs_loader command line parameters.
- BADFILE
  The current keyword is invalid and will be ignored during running. If no .bad file is specified in the gs_loader command, a .bad file will be generated based on the name of the corresponding control file.
- OPTIONS
  Only the skip and rows parameters take effect. skip=n indicates that the first n records are skipped during import, and rows=n indicates the number of rows to be imported before a commit. If both the command line and control file are specified, the command line has a higher priority.
- INSERT | APPEND | REPLACE | TRUNCATE
  Import mode.
  
  INSERT: If the table contains data, an error is reported.
  
  APPEND: Data is inserted directly.
  
  REPLACE: If the table contains data, all data is deleted and then inserted.
  
  TRUNCATE: If the table contains data, all data is deleted and then inserted.
  - When writing a control file (.ctl), you can specify the import mode (INSERT | APPEND | REPLACE | TRUNCATE) before and after the INTO TABLE table_name statement. The priority is as follows: The import mode specified after the statement takes precedence over and overwrites that specified before the statement.
  - When multiple gs_loader sessions are started to concurrently import data to the same table, you are advised to use the APPEND mode. If you use the INSERT, REPLACE, or TRUNCATE mode, an import error may occur or the imported data may be incomplete.
- FIELDS CSV
  Specifies that the CSV mode of COPY is used. In CSV mode, the default separator is a comma (,), and the default quotation mark is a double quotation mark (").
  
  In the current CSV mode, quoted line feeds are considered as part of the column data.
- table_name
  Specifies the name (possibly schema-qualified) of an existing table.
  
  Value range: an existing table name
- TERMINATED [BY] { 'string' }
  The string that separates columns within each row (line) of the file, and it cannot be larger than 10 bytes.
  
  Value range: The value cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789
  
  Value range: The default value is a tab character in text format and a comma in CSV format.
- OPTIONALLY ENCLOSED BY { 'string' }
  Specifies a quoted character string for a CSV file.
  
  The default value is double quotation marks (") only in CSV mode that is explicitly specified by the FIELDS CSV parameter.
  
  In other modes, there is no default value.
  - When you set OPTIONALLY ENCLOSED BY { 'string' }, either there is no quotation mark on the left of the data, or the quotation marks on the left and right must be an odd number but do not have to be equal.
  - Currently, OPTIONALLY ENCLOSED BY { 'string' } is supported only in CSV mode. If OPTIONALLY ENCLOSED BY { 'string' } is specified, the system enters the CSV mode by default.
- TRAILING NULLCOLS
  Specifies how to handle the problem that multiple columns of a row in a source data file are lost during data import.
  
  If one or more columns at the end of a row are null, the columns are imported to the table as null values. If this parameter is not set, an error message is displayed, indicating that the error column is null. In this case, the data in this row is processed as an error.
- WHEN { (start:end) | column_name } {= | !=}
  Filters rows by character string between start and end or by column name.
  
  Value range: a string.
- POSITION ({ start:end })
  Processes columns and obtain the corresponding character strings between start and end.
- "sql_string"
  Processes columns and calculates column values based on column expressions. For details, see • Column expression.
  
  Value range: a string.
- FILLER
  Processes columns. If FILLER occurs, this column is skipped.
- column_type [external]
  Processes the imported data according to different data types. For details, see • Data types.
- CONSTANT
  Processes columns and sets the inserted columns to constants.
  
  Value range: a string.
- SEQUENCE ( { COUNT | MAX | integer } [, incr] )
  Processes columns to generate the corresponding sequence values.
  - COUNT: The count starts based on the number of rows in the table.
  - MAX: The count starts from the maximum value of this column in the table.
  - integer: The count starts from the specified value.
  - incr: indicates the increment each time.
- NULLIF
  Processes columns. In multi-row import scenarios, if sysdate, constant, position, or column expression is not specified after a column name, the column whose NULLIF keyword is not specified is left empty.
  
  Currently, only the COL POSITION() CHAR NULLIF (COL=BLANKS) syntax is supported. For details, see • NULLIF use cases.
- OPTIONS, INFILE, and BADFILE are not supported. Syntax errors are not reported only in specific scenarios.
- gs_loader uses a .bad file to record errors coming from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.
- If a large number of constraint conflicts exist in the data file to be imported at a time, for example, the memory of the database server is small (for example, 32 GB) and the number of constraint conflicts exceeds 2 million rows, or the memory is greater than 128 GB and the number of constraint conflicts exceeds 10 million rows, a large amount of cache may be occupied. As a result, "ERROR: memory is temporarily unavailable" is reported and the import fails. Therefore, you are advised not to use the feature of non-rollback upon constraint conflicts.

If the data in the .bad file is empty, refer to the source file and row number in the error table. (The code sequence is not identified, the .bad file content is not written, and only blank rows are recorded.)

loader=# select * from pgxc_copy_error_log;
       relname        |           begintime           | filename | lineno | rawrecord |                     detail                      
----------------------+-------------------------------+----------+--------+-----------+-------------------------------------------------
 public.test_gsloader | 2023-02-09 09:20:33.646843-05 | STDIN    |      1 |           | invalid byte sequence for encoding "UTF8": 0xb4
(1 row)
// In the preceding example, for the file corresponding to the loader, search for the first row of the data text to find the source data.

NULLIF use cases

// Create a table.
CREATE TABLE gsloader_test_nullif(
col1   varchar2(100) not null enable,
col2   number(5,0) not null enable,
col3   varchar2(200) not null enable,
col4   varchar2(34) not null enable,
col5   varchar2(750),
col6   number(20,0),
col7   varchar2(4000),
col8   varchar2(200)
);
// Data file test.csv
6007 17060072021-09-0360070001102010000000230          1        600700010000218               0        1        1        229465        3
6007 17060072021-09-0360070001102010000000299          1        600700010000282               0        1        1        230467        3
6007 17060072021-09-0360070001102010000000242          1        600700010000255               0        1        1        226400        3
6007 17060072021-09-0360070001102010000000202          1        600700010000288               0        1        1        219107        3
6007 17060072021-09-0360070001102010000000294          1        600700010000243               0        1        1        204404        3
6007 17060072021-09-0360070001102010000000217          1        600700010000270               0        1        1        226644        3
// Control file test.ctl
LOAD DATA  
CHARACTERSET UTF8
TRUNCATE
INTO TABLE gsloader_test_nullif
TRAILING NULLCOLS
(COL1 POSITION(1:10) CHAR NULLIF (COL1 = BLANKS),
COL2  POSITION(11:14) CHAR NULLIF (COL2 = BLANKS),
COL3  POSITION(21:30) CHAR NULLIF (COL3 = BLANKS),
COL4  POSITION(31:40) CHAR NULLIF (COL4 = BLANKS),
COL5  sysdate,
COL6,
COL7, 
COL8 POSITION(71:80) CHAR NULLIF (COL8 = BLANKS))
// Import data.
GS_LOADER -p xxx host=xxx control=test.ctl  data=test.csv -d testdb -W xxx 
// Result: Imported.
loader=# SELECT * FROM gsloader_test_nullif;
    col1    | col2 |    col3    |    col4    |        col5         | col6 | col7 |   col8
------------+------+------------+------------+---------------------+------+------+-----------
 6007 17060 |  720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 |      |      | 010000218
 6007 17060 |  720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 |      |      | 010000282
 6007 17060 |  720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 |      |      | 010000255
 6007 17060 |  720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 |      |      | 010000288
 6007 17060 |  720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 |      |      | 010000243
 6007 17060 |  720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 |      |      | 010000270
(6 rows)

According to the data in the imported table, after the NULLIF keyword is used, the imported columns are normal except for the columns with the specified NULLIF and sysdate calculations. The imported columns without specified calculations are empty.

Column expression

gs_loader supports expression conversion and scenario extension for specified columns.

({ column_name [ data_type ] [ AS transform_expr ] } [, ...])

data_type specifies the data type of the column in the expression parameter. transform_expr specifies the target expression and returns the result value whose data type is the same as that of the target column in the table.

Example:

The column type is not specified in the .ctl file, and the source data does not meet the column restrictions (data type and length restrictions) in the table.

// Create a table.
create table t_test(id int, text varchar(5));
// Data file test.csv
addf2,bbbbaaa,20220907,
// Control file test.ctl
Load Data
TRUNCATE INTO TABLE t_test
fields terminated by ','
TRAILING NULLCOLS(
id "length(trim(:id))",
text "replace(trim(:text),'bbbb','aa')"
)
// guc_param file
set a_format_copy_version='s1';
// Import data.
gs_loader -p xxx host=xxx control=test.ctl  data=test.csv -d testdb -W xxx guc_param=test_guc.txt
// Result: Imported.
select * from t_test;
 id | text  
----+-------
  5 | aaaaa
(1 row)

The column type is not specified in the .ctl file, and the implicit type conversion is performed. (You are advised to add compatibility parameters because the implicit type conversion is involved.)

// Create a table.
create table test(mes int, mes1 text, mes2 float8, mes3 timestamp with time zone, mes4 INTEGER);
// Data file
cat load_support_transform.data
1,mmoo,12.6789,Thu Jan 01 15:04:28 1970 PST,32767
2,yyds,180.883,Thu Jun 21 19:00:00 2012 PDT,32768
// Control file
cat load_support_transform.ctl 
Load Data
TRUNCATE INTO TABLE test
fields terminated by ','
TRAILING NULLCOLS(
mes,
mes1 "mes1 || mes2",
mes2 "mes2 + 1",
mes3 "date_trunc('year', mes3)",
mes4
)
// guc_param file
cat test_guc.txt
set a_format_copy_version='s1';
set a_format_dev_version='s2';
set a_format_version='10c';
// Import data.
gs_loader -p xxx host=xxx control=load_support_transform.ctl data=load_support_transform.data -d testdb -W xxx guc_param=test_guc.txt 
// Result: Imported.
select * from test;
 mes |    mes1     |  mes2   |          mes3          | mes4  
-----+-------------+---------+------------------------+-------
   1 | mmoo12.6789 | 13.6789 | 1970-01-01 00:00:00+08 | 32767
   2 | yyds180.883 | 181.883 | 2012-01-01 00:00:00+08 | 32768

Data types
Correspond to column_type [external] in the control file. During data loading, data is processed based on the data type. gs_loader classifies data types into common and special data types.
- Common data types
  - CHAR [(length)]:
    Reads data based on a column separator and converts the value to the CHAR type. length indicates the maximum length of a single piece of data, in bytes. Generally, one character occupies one byte. The value can be left blank. The scenarios are as follows:
    - If a length is not declared, the value inherits the maximum length value declared by POSITION.
    - If a length is declared, it overwrites the maximum length declared by POSITION.
    - If neither length nor POSITION declares a length, the value is set based on the length between separators.
    - The priority of the length declaration is as follows: length > POSITION > separator.
    - If none of length, POSITION, and separator declares a length, the default length is 1.
    - If the actual data length exceeds the maximum value declared by length, an error is reported.
  - INTEGER external [(length)]:
    Reads data based on a column separator and converts the value to the INTEGER type. The rules for using length are the same as those described in "CHAR [(length)]."
  - FLOAT external [(length)]:
    Reads data based on a column separator and converts the value to the FLOAT type. The rules for using length are the same as those described in "CHAR [(length)]."
  - DECIMAL external (length):
    Reads data based on a column separator and converts the value to the DECIMAL type. The rules for using length are the same as those described in "CHAR [(length)]."
  - TIMESTAMP:
    Reads data based on a column separator and converts the value to the TIMESTAMP type.
  - DATE:
    Reads data based on a column separator and converts the value to the DATE type.
  - DATE external:
    Reads data based on a column separator and converts the value to the DATE type.
  - SYSDATE:
    Obtains the system time when the corresponding insertion is performed in the database. The value cannot be referenced. The referenced content is the SYSDATE character string.
- Special data types
  - INTEGER:
    Ignores the column separator, reads four-byte characters, saves them based on the little-endian storage logic, parses each character into a hexadecimal ASCII code value, and converts the value into a decimal number.
  - SMALLINT:
    Ignores the column separator, reads two-byte characters, saves them based on the little-endian storage logic, parses each character into a hexadecimal ASCII code value, and converts the value into a decimal number.
    
    Example:
```
// Create a table.
create table t_spec(col1 varchar(10), col2 varchar(10));
// Data file
cat t_spec.txt
1234,5678,
// Control file
cat t_spec.ctl
Load Data
TRUNCATE INTO TABLE t_spec
fields terminated by ','
TRAILING NULLCOLS(
col1 position(2:6) integer,
col2 position(5:8) smallint
)
// guc_param file
cat test_guc.txt
set a_format_copy_version='s1';
set a_format_dev_version='s2';
set a_format_version='10c';
// Import data.
gs_loader -p xxx host=xxx control=t_spec.ctl data=t_spec.txt -d testdb -W xxx guc_param=test_guc.txt
// Result: Imported.
select * from t_spec;
   col1    | col2
-----------+-------
 741618482 | 13612
(1 row)
```
  - RAW:
    Parses each character into an ASCII code value. The backslash (\) is not used as an escape character.
    
    Restriction: Separators cannot be used in RAW data.
    
    Example:
```
// Create a table.
create table t_raw(col raw(50));
// Data file
cat t_raw.txt
12\n\x78!<~?'k^(%s)>/c[$50]
// Control file
cat t_raw.ctl
Load Data
TRUNCATE INTO TABLE t_raw
TRAILING NULLCOLS(
col position(1:50) raw
)
// guc_param file
cat test_guc.txt
set a_format_copy_version='s1';
set a_format_dev_version='s2';
set a_format_version='10c';
// Import data.
gs_loader -p xxx host=xxx control=t_raw.ctl data=t_raw.txt -d testdb -W xxx guc_param=test_guc.txt
// Result: Imported.
select * from t_raw;
                          col
--------------------------------------------------------
 31325C6E5C783738213C7E3F276B5E282573293E2F635B2435305D
(1 row)
```
  - In the multi-column import scenario, if the GUC parameter is not specified, some positions and separators cannot be used at the same time.
  - In the multi-column import scenario, if the GUC parameter is specified, the POSITION operation cannot be used for some columns.
  - In the multi-column import scenario, if common data types and special data types are used together, you need to specify POSITION for all data types.
  - When importing data of a specified data type, you need to use guc_param to set a_format_copy_version for common data types and use guc_param to set a_format_copy_version, a_format_dev_version and a_format_version for special data types.
  - If a column expression involves a system function, you need to use guc_param to set a_format_dev_version and a_format_version based on the corresponding function.
  - If the data type contains length, the value of length must be set to an integer greater than 0. The special data type RAW(length) is used differently from common types. For example, if POSITION is not specified for the common type INTEGER EXTERNAL(length), an error is reported when the value of length is less than the data length of the corresponding column in a text file (such as .csv or .txt). If the value of length is greater than the data length of the corresponding column in a text file (such as .txt), the result of the INTEGER EXTERNAL type is output. If POSITION is not specified for the special data type RAW(length), the first length characters are read.
  - If POSITION(start:end) is specified, the value of start must be set to an integer greater than 0, and the value of end must be greater than or equal to the value of start.
  - During concurrent import, if multiple names of files specified by discard or bad point to files with the same name in the same directory, gs_loader stops importing the next file and reports an error. If a previous file has been imported, the file will be overwritten.
    The following error is reported:
    
    ERROR: An error occurred. Please check logfile.
    
    In the log file:
    
    …lock failed: Resource temporarily unavailable…
  - If the column value in the control file is not empty and the column content is not used, the location of the data file is not occupied.
    For example, the control file is as follows:
    
    Load Data TRUNCATE INTO TABLE gsloader fields terminated by ',' TRAILING NULLCOLS( id "trim(:id)", text "to_char(SYSDATE,'yyyymmdd')", gmt_create "trim(:gmt_create)", create_str "trim(:create_str)" )
    
    The data file is as follows:
    
    11,HELLO,33,
    
    The import result is as follows:
    
    loader=# select * from gsloader; id | text | gmt_create | create_str ----+--------+---------------------+------------ 11 | 2023-02-08 16:00:54 | HELLO | 33

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot