gs_loader
Overview
- gs_loader is used to import data. gs_loader converts the syntax supported by the control file to the \COPY syntax, uses the existing \COPY function to import data, and records the \COPY result in logs.
- Before using gs_loader, ensure that the gs_loader version is consistent with the gsql version and database version.
Installation and Deployment
Install and configure the gs_loader client tool on the server where source data files are stored so that you can use the gs_loader tool to import data.
- Create a directory for storing the gs_loader tool package.
mkdir -p /opt/bin
- Upload the gsql package to the created directory.
Upload the gsql tool package GaussDB-Kernel_Database version number_OS version number_64bit_gsql.tar.gz (the EulerOS tool package is used as an example) in the software installation package to the directory created in the previous step.
- Go to the new directory and decompress the package.
cd /opt/bin tar -zxvf GaussDB-Kernel_Database version number_OS version number_64bit_gsql.tar.gz source gsql_env.sh
- Verify the tool location and version information.
which gs_loader
- Verify the client version information.
The gs_loader tool version corresponds to the gsql tool version. You can directly query the gsql client version.
gsql -V
- Verify that the database version is the same as the client tool version.
Use gsql to connect to the database and run the following command:
select version();
Log Level Configuration
Set the log level for developers to view. After the setting, the tool running information is printed on the console.
export gs_loader_log_level=debug export gs_loader_log_level=info export gs_loader_log_level=warning export gs_loader_log_level=error
Permission
The application scenarios are classified into separation-of-duties and non-separation-of-duties scenarios. You can set enableSeparationOfDuty to on or off to enable or disable the separation of duties function.
- If enableSeparationOfDuty is set to off:
The user can be a common user or the database administrator. If the user is a common user, the administrator needs to grant permissions to the common user. The administrator account can be used directly.
Create a user.
CREATE USER load_user WITH PASSWORD '************';
Create related tables and grant permissions.
GRANT ALL ON FUNCTION copy_error_log_create() TO load_user; GRANT ALL ON SCHEMA public TO load_user; SELECT copy_error_log_create(); SELECT copy_summary_create(); GRANT ALL PRIVILEGES ON public.pgxc_copy_error_log To load_user; GRANT ALL PRIVILEGES ON public.gs_copy_summary To load_user;
- If enableSeparationOfDuty is set to on:
The user can be a common user or the database administrator. Create the pgxc_copy_error_log and gs_copy_summary tables in their respective schemas and add indexes. No permission granting is required.
Create a user.
CREATE USER load_user WITH PASSWORD '********';
Create related tables and add indexes.
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text); CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname); CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text); CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
Usage Environment
You need to add the tool path to PATH. gs_loader supports SSL-based encrypted communication. The method of using gs_loader is the same as that of using gsql.
Adding System Catalogs
The gs_copy_summary table is added to record the COPY execution result summary, including the number of successful rows, number of error rows, number of ignored rows, and number of empty rows.
The copy_summary_create function is added to create the gs_copy_summary table.
The format of the gs_copy_summary table is as follows:
relname | public.sqlldr_tbl begintime | 2021-09-03 16:00:11.7129-04 endtime | 2021-09-03 16:00:15.259908-04 id | 21870 pid | 47582725060352 readrows | 100000 skiprows | 0 loadrows | 111 errorrows | 0 whenrows | 99889 allnullrows | 0 detail | 111 Rows successfully loaded. | 0 Rows not loaded due to data errors. | 99889 Rows not loaded because all WHEN clauses were failed. | 0 Rows not loaded because all fields were null. |
Usage Guidelines
- Create a user and related tables, and add indexes.
- If the separation of duties function is disabled for common users only:
CREATE USER load_user WITH PASSWORD '************';
Create related tables and grant permissions.
The gs_copy_summary and pgxc_copy_error_log tables cannot contain objects that may cause privilege escalation, such as RULE, TRIGGER, index functions, row-level security, CHECK constraints, GENERATED columns, DEFAULT columns, and ON UPDATE columns; otherwise, the system considers that the user is created by a malicious user, reports an error, and exits.
GRANT ALL ON FUNCTION copy_error_log_create() TO load_user; GRANT ALL ON SCHEMA public TO load_user; SELECT copy_error_log_create(); SELECT copy_summary_create(); GRANT ALL PRIVILEGES ON public.pgxc_copy_error_log To load_user; GRANT ALL PRIVILEGES ON public.gs_copy_summary To load_user;
- If the separation of duties function is enabled for common users and administrators:
CREATE USER load_user WITH PASSWORD '********';
Create related tables and add indexes.
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text); CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname); CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text); CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
- If the separation of duties function is disabled for common users only:
- Create a table and a control file, and prepare a data file.
Create the loader_tbl table.
1 2 3 4 5 6 7
CREATE TABLE loader_tbl ( ID NUMBER, NAME VARCHAR2(20), CON VARCHAR2(20), DT DATE );
Create the loader.ctl control file.
1 2 3 4 5 6 7 8 9 10 11
LOAD DATA truncate into table loader_tbl WHEN (2:2) = ',' fields terminated by ',' trailing nullcols ( id integer external, name char(32), con ":id || '-' || :name", dt date )
Create the data.csv data file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1,OK,,2007-07-8 2,OK,,2008-07-8 3,OK,,2009-07-8 4,OK,,2007-07-8 43,DISCARD,,2007-07-8 ,,, 32,DISCARD,,2007-07-8 a,ERROR int,,2007-07-8 8,ERROR date,,2007-37-8 ,,,, , 8,ERROR fields,,2007-37-8 ,,, 5,OK,,2021-07-30
- Import the data.
Before importing data, ensure that the gs_loader tool has the execute permission. Ensure that the current path has the write permission on files. (The gs_loader generates some temporary files during the processing and automatically deletes them after the import is complete.)
1
gs_loader control=loader.ctl data=data.csv db=testdb bad=loader.bad errors=5 port=8000 passwd=************ user=load_user
Execution result:
1 2 3 4 5 6
gs_loader: version 0.1 10 Rows successfully loaded. log file is: loader.log
Parameters
Parameter |
Description |
Parameter Type/Value Range |
---|---|---|
help |
Displays help information. |
- |
user |
Database connection user (equivalent to -U). |
String |
-U |
Database connection user (equivalent to user). |
String |
passwd |
User password (equivalent to -W). |
String |
-W |
User password (equivalent to passwd). |
String |
db |
Database name. This parameter is required and is equivalent to -d. |
String |
-d |
Database name. This parameter is required and is equivalent to db. |
String |
host |
Specifies the host name of the machine on which the server is running or the directory for the UDS (equivalent to -h). |
See the gsql --host parameter. |
-h |
Specifies the host name of the machine on which the server is running or the directory for the UDS (equivalent to host). |
See the gsql --host parameter. |
port |
Port number of the database server (equivalent to -p). |
See the gsql --port parameter. |
-p |
Port number of the database server (equivalent to port). |
See the gsql --port reference. |
create |
Determines whether to create the pgxc_copy_error_log and gs_copy_summary tables. |
The value can be true or false. The default value is true. |
clean |
Specifies whether to clear the error record. |
The value can be true or false. The default value is false. |
data |
(Required) Data file. You can specify multiple data files or use wildcards (*) and question marks (?) to represent multiple data files. |
String |
control |
(Required) Name of a control file. |
String |
log |
Name of a log file. |
String |
bad |
Name of the file recording the error lines. You can also specify a directory to generate the file based on the data file name. |
String |
discard |
Name of the file recording the lines that fail to be matched by WHEN. You can also specify a directory to generate the file name based on the data file name. |
String |
errors |
Maximum number of error lines in a data file. |
Integer Default value: 0 |
skip |
Number of first lines that can be skipped in a data file. |
Integer Default value: 0 |
bindsize |
Only syntax compatibility is implemented, but functions are not implemented. |
- |
rows |
Only syntax compatibility is implemented, but functions are not implemented. |
- |

- All parameters are in lowercase and are compatible with the gsql login mode, including -p port number, -h host, -d database, -U username, and -W password.
- gs_loader uses a .bad file to record errors coming from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.
Control Files
- Syntax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
LOAD [ DATA ] [CHARACTERSET char_set_name] [INFILE [directory_path] [filename ] ] [BADFILE [directory_path] [filename ] ] [{ INSERT | APPEND | REPLACE | TRUNCATE }] INTO TABLE table_name [{ INSERT | APPEND | REPLACE | TRUNCATE }] [FIELDS CSV] [TERMINATED [BY] { 'string' }] [OPTIONALLY ENCLOSED BY { 'string' }] [TRAILING NULLCOLS] [ WHEN { (start:end) | column_name } {= | !=} 'string' ] [( col_name [ [ POSITION ({ start:end }) ] ["sql_string"] ] | [ FILLER [column_type [external] ] ] | [ CONSTANT "string" ] | [ SEQUENCE ( { COUNT | MAX | integer } [, incr] ) ]|[NULLIF (COL=BLANKS)] [, ...] )]
- Parameter description:
- CHARACTERSET
Character set.
Value range: a string.
- INFILE
The current keyword is invalid and needs to occupy a separate line in the control file. The keyword is ignored during running. You need to specify the corresponding data file in the gs_loader command line parameters.
- BADFILE
The current keyword is invalid and will be ignored during running. If badfile is not specified in the gs_loader command, a badfile will be generated based on the name of the corresponding control file.
- INSERT | APPEND | REPLACE | TRUNCATE
Import mode.
INSERT: If the table contains data, an error is reported.
APPEND: Data is inserted directly.
REPLACE: If the table contains data, all data is deleted and then inserted.
TRUNCATE: If the table contains data, all data is deleted and then inserted.
- When writing a control file (.ctl), you can specify the import mode (INSERT | APPEND | REPLACE | TRUNCATE) before and after the INTO TABLE table_name statement. The priority is as follows: The import mode specified after the statement takes precedence over and overwrites that specified before the statement.
- When multiple gs_loader sessions are started to concurrently import data to the same table, you are advised to use the APPEND mode. If you use the INSERT, REPLACE, or TRUNCATE mode, an import error may occur or the imported data may be incomplete.
- table_name
Specifies the name (possibly schema-qualified) of an existing table.
Value range: an existing table name.
- FIELDS csv
Specifies that the CSV mode of COPY is used. In CSV mode, the default separator is a comma (,), and the default quotation mark is a double quotation mark (").
- TERMINATED [BY] { 'string' }
The string that separates columns within each row (line) of the file, and it cannot be larger than 10 bytes.
Value range: The delimiter cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789
Value range: The default value is a tab character in text format and a comma in CSV format.
- OPTIONALLY ENCLOSED BY { 'string' }
Specifies a quoted character string for a CSV file.
The default value is double quotation marks (") only in CSV mode that is explicitly specified by the FIELDS CSV parameter.
In other modes, there is no default value.
When you set OPTIONALLY ENCLOSED BY { 'string' }, there should be no quotation mark on the left of the data; otherwise, the number of quotation marks on either left or right must be an odd number but they do not have to be equal.
- TRAILING NULLCOLS
Specifies how to handle the problem that multiple columns of a row in a source data file are lost during data import.
- WHEN { (start:end) | column_name } {= | !=}
Filters rows by character string between start and end or by column name.
Value range: a string.
- POSITION ({ start:end })
Processes columns and obtain the corresponding character strings between start and end.
- "sql_string"
Processes columns and calculates column values based on column expressions.
Value range: a string.
- FILLER
Processes columns. If FILLER occurs, this column is skipped.
- CONSTANT
Processes columns and sets the inserted columns to constants.
Value range: a string.
- SEQUENCE ( { COUNT | MAX | integer } [, incr] )
Processes columns to generate the corresponding sequence values.
- COUNT: The count starts based on the number of rows in the table.
- MAX: The count starts from the maximum value of this column in the table.
- integer: The count starts from the specified value.
- incr: indicates the increment each time.
- NULLIF
Leave the field empty. Currently, only the COL POSITION() CHAR NULLIF (COL=BLANKS) syntax is supported.
- OPTIONS, INFILE, and BADFILE are not supported. Syntax errors are not reported only in specific scenarios.
- gs_loader uses a .bad file to record errors from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.
- CHARACTERSET
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot