gs_loader
Overview
- gs_loader is used to import data. gs_loader converts the syntax supported by the control file to the \COPY syntax, uses the existing \COPY function to import data, and records the \COPY result in logs.
- Before using gs_loader, ensure that the gs_loader version is consistent with the gsql version and database version.
Log Level Configuration
Set the log level for developers to view. After the setting, the tool running information is printed on the console.
export gs_loader_log_level=debug export gs_loader_log_level=info export gs_loader_log_level=warning export gs_loader_log_level=error
Permission
The application scenarios are classified into separation-of-duties and non-separation-of-duties scenarios. You can set enableSeparationOfDuty to on or off to enable or disable the separation of duties function.
- If enableSeparationOfDuty is set to off:
The user can be a common user or the database administrator. If the user is a common user, the administrator needs to grant permissions to the common user. The administrator account can be used directly.
Create a user.
CREATE USER load_user WITH PASSWORD '************';
Create related tables and grant permissions.
GRANT ALL ON FUNCTION copy_error_log_create() TO load_user; GRANT ALL ON SCHEMA public TO load_user; SELECT copy_error_log_create(); SELECT copy_summary_create(); GRANT ALL PRIVILEGES ON public.pgxc_copy_error_log To load_user; GRANT ALL PRIVILEGES ON public.gs_copy_summary To load_user;
- If enableSeparationOfDuty is set to on:
The user can be a common user or the database administrator. Create the pgxc_copy_error_log and gs_copy_summary tables in their respective schemas and add indexes. No permission granting is required.
Create a user.
CREATE USER load_user WITH PASSWORD '********';
Create related tables and add indexes.
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text); CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname); CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text); CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
Usage Environment
You need to add the tool path to PATH. gs_loader supports SSL encrypted communication. The method of using gs_loader is the same as that of using gsql. For details, refer to the Administrator Guide.
Adding System Catalogs
The gs_copy_summary table is added to record the COPY execution result summary, including the number of successful rows, number of error rows, number of ignored rows, and number of empty rows.
The copy_summary_create function is added to create the gs_copy_summary table.
The format of the gs_copy_summary table is as follows:
relname | public.sqlldr_tbl begintime | 2021-09-03 16:00:11.7129-04 endtime | 2021-09-03 16:00:15.259908-04 id | 21870 pid | 47582725060352 readrows | 100000 skiprows | 0 loadrows | 111 errorrows | 0 whenrows | 99889 allnullrows | 0 detail | 111 Rows successfully loaded. | 0 Rows not loaded due to data errors. | 99889 Rows not loaded because all WHEN clauses were failed. | 0 Rows not loaded because all fields were null. |
Usage Guidelines
- Create a user and related tables, and add indexes.
- If the separation of duties function is disabled for common users only:
CREATE USER load_user WITH PASSWORD '************';
Create related tables and grant permissions.
GRANT ALL ON FUNCTION copy_error_log_create() TO load_user; GRANT ALL ON SCHEMA public TO load_user; SELECT copy_error_log_create(); SELECT copy_summary_create(); GRANT ALL PRIVILEGES ON public.pgxc_copy_error_log To load_user; GRANT ALL PRIVILEGES ON public.gs_copy_summary To load_user;
- If the separation of duties function is enabled for common users and administrators:
CREATE USER load_user WITH PASSWORD '********';
Create related tables and add indexes.
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text); CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname); CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text); CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
- If the separation of duties function is disabled for common users only:
- Create a table and a control file, and prepare a data file.
Create the loader_tbl table.
1 2 3 4 5 6 7
CREATE TABLE loader_tbl ( ID NUMBER, NAME VARCHAR2(20), CON VARCHAR2(20), DT DATE );
Create the loader.ctl control file.
1 2 3 4 5 6 7 8 9 10 11
LOAD DATA truncate into table loader_tbl WHEN (2:2) = ',' fields terminated by ',' trailing nullcols ( id integer external, name char(32), con ":id || '-' || :name", dt date )
Create the data.csv data file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1,OK,,2007-07-8 2,OK,,2008-07-8 3,OK,,2009-07-8 4,OK,,2007-07-8 43,DISCARD,,2007-07-8 ,,, 32,DISCARD,,2007-07-8 a,ERROR int,,2007-07-8 8,ERROR date,,2007-37-8 ,,,, , 8,ERROR fields,,2007-37-8 ,,, 5,OK,,2021-07-30
- Import the data.
Before importing data, ensure that the gs_loader tool has the execute permission. Ensure that the current path has the write permission on files. (The gs_loader generates some temporary files during the processing and automatically deletes them after the import is complete.)
1
gs_loader control=loader.ctl data=data.csv db=testdb bad=loader.bad errors=5 port=8000 passwd=************ user=load_user
Execution result:
1 2 3 4 5 6
gs_loader: version 0.1 10 Rows successfully loaded. log file is: loader.log
Parameter Description
Parameter |
Parameter Description |
Parameter Type: Value Range |
---|---|---|
help |
Displays help information. |
- |
user |
Database connection user (equivalent to -U). |
String |
-U |
Database connection user (equivalent to user). |
String |
passwd |
User password (equivalent to -W). |
String |
-W |
User password (equivalent to passwd). |
String |
db |
Database name. This parameter is mandatory and is equivalent to -d. |
String |
-d |
Database name. This parameter is mandatory and is equivalent to db. |
String |
host |
Specifies the host name of the machine on which the server is running or the directory for the Unix-domain socket (equivalent to -h). |
See the gsql --host parameter. |
-h |
Specifies the host name of the machine on which the server is running or the directory for the Unix-domain socket (equivalent to host). |
See the gsql --host parameter. |
port |
Port number of the database server (equivalent to -p). |
See the gsql --port parameter. |
-p |
Port number of the database server (equivalent to port). |
See the gsql --port reference. |
create |
Determines whether to create the pgxc_copy_error_log and gs_copy_summary tables. |
The value can be true or false. The default value is true. |
clean |
Indicates whether to clear the error record. |
The value can be true or false. The default value is false. |
data |
(Mandatory) Data file. You can specify multiple data files or use wildcards (*) and question marks (?) to represent multiple data files. |
String |
control |
(Mandatory) Name of a control file. |
String |
log |
Name of a log file. |
String |
bad |
Name of the file recording the error lines. You can also specify a directory to generate the file based on the data file name. |
String |
discard |
Name of the file recording the lines that fail to be matched by WHEN. You can also specify a directory to generate the file name based on the data file name. |
String |
errors |
Maximum number of error lines in a data file. |
Integer Default value: 0 |
skip |
Number of first lines that can be skipped in a data file. |
Integer Default value: 0 |
bindsize |
Only syntax compatibility is implemented, but functions are not implemented. |
- |
rows |
Only syntax compatibility is implemented, but functions are not implemented. |
- |
- All parameters are in lowercase and are compatible with the gsql login mode, including -p port number, -h host, -d database, -U username, and -W password.
- gs_loader uses a .bad file to record errors from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.
Control Files
- Syntax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
LOAD [ DATA ] [CHARACTERSET char_set_name] [INFILE [directory_path] [filename ] ] [BADFILE [directory_path] [filename ] ] [{ INSERT | APPEND | REPLACE | TRUNCATE }] INTO TABLE table_name [{ INSERT | APPEND | REPLACE | TRUNCATE }] [FIELDS CSV] [TERMINATED [BY] { 'string' }] [OPTIONALLY ENCLOSED BY { 'string' }] [TRAILING NULLCOLS] [ WHEN { (start:end) | column_name } {= | !=} 'string' ] [( col_name [ [ POSITION ({ start:end }) ] ["sql_string"] ] | [ FILLER [column_type [external] ] ] | [ CONSTANT "string" ] | [ SEQUENCE ( { COUNT | MAX | integer } [, incr] ) ]|[NULLIF (COL=BLANKS)] [, ...] )]
- Parameter description:
- CHARACTERSET
Character set.
Value range: a string.
- INFILE
The current keyword is invalid and needs to occupy a separate line in the control file. The keyword is ignored during running. You need to specify the corresponding data file in the gs_loader command line parameters.
- BADFILE
The current keyword is invalid and will be ignored during running. If badfile is not specified in the gs_loader command, a badfile will be generated based on the name of the corresponding control file.
- INSERT | APPEND | REPLACE | TRUNCATE
Import mode.
INSERT: If the table contains data, an error is reported.
APPEND: Data is inserted directly.
REPLACE: If the table contains data, all data is deleted and then inserted.
TRUNCATE: If the table contains data, all data is deleted and then inserted.
- table_name
Specifies the name (possibly schema-qualified) of an existing table.
Value range: an existing table name
- FIELDS csv
Specifies that the CSV mode of COPY is used. In CSV mode, the default separator is a comma (,), and the default quotation mark is a double quotation mark (").
- TERMINATED [BY] { 'string' }
The string that separates columns within each row (line) of the file, and it cannot be larger than 10 bytes.
Value range: The delimiter cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789
Value range: The default value is a tab character in text format and a comma in CSV format.
- OPTIONALLY ENCLOSED BY { 'string' }
Specifies a quoted character string for a CSV file.
The default value is double quotation marks (") only in CSV mode that is explicitly specified by the FIELDS CSV parameter.
In other modes, there is no default value.
- TRAILING NULLCOLS
Specifies how to handle the problem that multiple columns of a row in a source data file are lost during data import.
- WHEN { (start:end) | column_name } {= | !=}
Filters rows by character string between start and end or by column name.
Value range: a string.
- POSITION ({ start:end })
Processes columns and obtain the corresponding character strings between start and end.
- "sql_string"
Processes columns and calculates column values based on column expressions.
Value range: a string.
- FILLER
Processes columns. If FILLER occurs, this column is skipped.
- CONSTANT
Processes columns and sets the inserted columns to constants.
Value range: a string.
- SEQUENCE ( { COUNT | MAX | integer } [, incr] )
Processes columns to generate the corresponding sequence values.
- COUNT: The count starts based on the number of rows in the table.
- MAX: The count starts from the maximum value of this column in the table.
- integer: The count starts from the specified value.
- incr: indicates the increment each time.
- NULLIF
Leave the field empty. Currently, only the COL POSITION() CHAR NULLIF (COL=BLANKS) syntax is supported.
- OPTIONS, INFILE, and BADFILE are not supported. Syntax errors are not reported only in specific scenarios.
- gs_loader uses a .bad file to record errors from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.
- CHARACTERSET
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot