gs_loader
Overview
gs_loader is used to import data. gs_loader converts the syntax supported by the control file to the \COPY syntax, uses the existing \COPY function to import data, and records the \COPY result in logs.
Before using gs_loader, ensure that the gs_loader version is consistent with the gsql version and database version.
Installation and Deployment
Install and configure the gs_loader client tool on the server where source data files are stored so that you can use the gs_loader tool to import data.
- Create a directory for storing the gs_loader tool package.
mkdir -p /opt/bin
- Upload the gsql package to the created directory.
Upload the gsql tool package GaussDB-Kernel_Database version number_OS version number_64bit_gsql.tar.gz (the EulerOS tool package is used as an example) in the software installation package to the directory created in the previous step.
- Go to the new directory and decompress the package.
cd /opt/bin tar -zxvf GaussDB-Kernel_Database version number_OS version number_64bit_gsql.tar.gz source gsql_env.sh
- Verify the tool location and version information.
which gs_loader
- Verify the client version information.
The gs_loader tool version corresponds to the gsql tool version. You can directly query the gsql client version.
gsql -V
- Verify that the database version is the same as the client tool version.
Use gsql to connect to the database and run the following command:
select version();
Log Level Configuration
Set the log level for developers to view. After the setting, the tool running information is printed on the console.
export gs_loader_log_level=debug export gs_loader_log_level=info export gs_loader_log_level=warning export gs_loader_log_level=error
Permission
The application scenarios are classified into separation-of-duties and non-separation-of-duties scenarios. You can set enableSeparationOfDuty to on or off to enable or disable the separation of duties function.
The enable_copy_error_log GUC parameter specifies whether to use the error table pgxc_copy_error_log. The default value is off, indicating that the error table is not used and error records are directly recorded in the .bad file of gs_loader. If this parameter is set to on, the error table pgxc_copy_error_log is used and error records are written to both the .bad file and error table.
- By default, if enableSeparationOfDuty is set to off, the user can be a common database user or an administrator. If the user is a common user, the administrator needs to grant permissions to the common user. The administrator account can be used directly.
- Create a user (as an administrator).
CREATE USER load_user WITH PASSWORD '************';
- Grant the public schema permission to the user (as an administrator).
GRANT ALL ON SCHEMA public TO load_user;
- Create the gs_copy_summary table and grant table permissions to the user (as an administrator).
SELECT copy_summary_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='gs_copy_summary'); GRANT ALL PRIVILEGES ON public.gs_copy_summary To load_user;
- (Optional) Create the pgxc_copy_error_log table and grant table permissions to the user (as an administrator).
If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
SELECT copy_error_log_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='pgxc_copy_error_log'); GRANT ALL PRIVILEGES ON public.pgxc_copy_error_log To load_user;
- Create a user (as an administrator).
- If enableSeparationOfDuty is set to on, the user can be a common database user or an administrator. Create the pgxc_copy_error_log and gs_copy_summary tables in their respective schemas and add indexes. No permission granting is required.
- Create a user (as the initial user).
CREATE USER load_user WITH PASSWORD '********';
- Switch to the load_user user (as the initial user).
\c - load_user
- Create the gs_copy_summary table and add an index (as the created user).
CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text); CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
- (Optional) Create the pgxc_copy_error_log table and add an index (as the created user).
If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text); CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname);
- Create a user (as the initial user).
Usage Environment
You need to add the tool path to PATH. gs_loader supports SSL encrypted communication. The method of using gs_loader is the same as that of using gsql.
Adding System Catalogs
The gs_copy_summary table is added to record the COPY execution result summary, including the number of successful rows, number of error rows, number of ignored rows, and number of empty rows.
The copy_summary_create function is added to create the gs_copy_summary table.
The format of the gs_copy_summary table is as follows:
relname | public.sqlldr_tbl begintime | 2021-09-03 16:00:11.7129-04 endtime | 2021-09-03 16:00:15.259908-04 id | 21870 pid | 47582725060352 readrows | 100000 skiprows | 0 loadrows | 111 errorrows | 0 whenrows | 99889 allnullrows | 0 detail | 111 Rows successfully loaded. | 0 Rows not loaded due to data errors. | 99889 Rows not loaded because all WHEN clauses were failed. | 0 Rows not loaded because all fields were null. |
Columns in the gs_copy_summary System Catalog
Column |
Description |
---|---|
relname |
Name of the target table to be imported. |
begintime |
Start time of an import task. |
endtime |
End time of an import task. |
id |
ID of the transaction to be imported. |
pid |
ID of the worker thread for the current import. |
readrows |
Total number of data rows read by the import task. |
skiprows |
Total number of data rows skipped in the import task. |
loadrows |
Number of data rows successfully imported in the current import task. |
errorrows |
Number of error data rows in the current import task. |
whenrows |
Number of data rows that violate the WHEN filter criterion in the current import task. |
allnullrows |
Number of data rows where all columns are empty. |
detail |
Summary of the import task, including the number of successfully imported rows, number of error data rows, number of rows that violate the WHEN condition, and number of blank rows. |
Usage Guidelines
- (If the separation of duties function is disabled) For common users only:
- Create a user (as an administrator).
CREATE USER load_user WITH PASSWORD '************';
- Grant the public schema permission to the user (as an administrator).
GRANT ALL ON SCHEMA public TO load_user;
- Create the gs_copy_summary table and grant table permissions to the user (as an administrator).
SELECT copy_summary_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='gs_copy_summary'); GRANT ALL PRIVILEGES ON public.gs_copy_summary To load_user;
- (Optional) Create the pgxc_copy_error_log table and grant table permissions to the user (as an administrator).
If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
SELECT copy_error_log_create() WHERE NOT EXISTS(SELECT * FROM pg_tables WHERE schemaname='public' AND tablename='pgxc_copy_error_log'); GRANT ALL PRIVILEGES ON public.pgxc_copy_error_log To load_user;
- Switch to another user (as an administrator).
\c - load_user
- Create a user (as an administrator).
- (If the separation of duties function is enabled) For common users and administrators:
- Create a user (as the initial user).
CREATE USER load_user WITH PASSWORD '************';
- Switch to the load_user user (as the initial user).
\c - load_user
- Create the gs_copy_summary table and add an index (as the created user).
CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text); CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);
- (Optional) Create the pgxc_copy_error_log table and add an index (as the created user).
If the GUC parameter enable_copy_error_log is not set (off by default) or is set to off, you do not need to use the error table and do not need to create it. Otherwise, you need to create the error table.
CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text); CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname);
- Create a user (as the initial user).
- Create a table and a control file, and prepare a data file.
Create the loader_tbl table.
1 2 3 4 5 6 7
CREATE TABLE loader_tbl ( ID NUMBER, NAME VARCHAR2(20), CON VARCHAR2(20), DT DATE );
(On the gs_loader client) Create the control file loader.ctl.
1 2 3 4 5 6 7 8 9 10 11
LOAD DATA truncate into table loader_tbl WHEN (2:2) = ',' fields terminated by ',' trailing nullcols ( id integer external, name char(32), con ":id || '-' || :name", dt date )
(On the gs_loader client) Create the GUC parameter file guc.txt.
set a_format_copy_version='s1';
(On the gs_loader client) Create the data file data.csv.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1,OK,,2007-07-8 2,OK,,2008-07-8 3,OK,,2009-07-8 4,OK,,2007-07-8 43,DISCARD,,2007-07-8 ,,, 32,DISCARD,,2007-07-8 a,ERROR int,,2007-07-8 8,ERROR date,,2007-37-8 ,,,, , 8,ERROR fields,,2007-37-8 ,,, 5,OK,,2021-07-30
- Import the data.
Before importing data, ensure that the gs_loader tool has the execute permission. Ensure that the current path has the write permission on files. (The gs_loader generates some temporary files during the processing and automatically deletes them after the import is complete.)
1
gs_loader control=loader.ctl data=data.csv db=testdb bad=loader.bad errors=5 port=8000 passwd=************ user=load_user
Execution result:
1 2 3 4 5 6
gs_loader: version 0.1 5 Rows successfully loaded. log file is: loader.log
gs_copy_summary records the called COPY syntax and details. The [badfile]_bad.log file records error data and details. To prevent the error data and details recorded during the last import from being overwritten, you are advised to use different bad parameters for each import. The logging function using the pgxc_copy_error_log table is disabled by default. To use the error table pgxc_copy_error_log to record error data and details, enable the GUC parameter enable_copy_error_log. To delete data from a table, perform the TRUNCATE or DELETE operation on the table.
Parameters
Parameter |
Parameters |
Parameter Type: Value Range |
---|---|---|
help |
Displays help information. |
- |
user |
Database connection user (equivalent to -U). |
String |
-U |
Database connection user (equivalent to user). |
String |
passwd |
User password (equivalent to -W). |
String |
-W |
User password (equivalent to passwd). |
String |
db |
Database name. This parameter is required and is equivalent to -d. |
String |
-d |
Database name. This parameter is required and is equivalent to db. |
String |
host |
Specifies the host name of the running server, the UDS path, or the domain name. You can specify multiple host addresses by using character strings separated by commas (,). This parameter is equivalent to -h. If multiple host addresses are specified, the primary node is connected by default. |
See the gsql --host parameter. |
-h |
Specifies the host name of the running server, the UDS path, or the domain name. You can specify multiple host addresses by using character strings separated by commas (,). This parameter is equivalent to host. If multiple host addresses are specified, the primary node is connected by default. |
See the gsql --host parameter. |
port |
Specifies the port number of the database server. One or more port numbers can be configured. When one port number is configured, all IP addresses use the same port for connection. When multiple port numbers are configured, the sequence is the same as the IP address sequence, and the number of port numbers must be the same as the number of IP addresses. If they are different, an error is reported. This parameter is equivalent to -p. |
See the gsql --port parameter. |
-p |
Specifies the port number of the database server. One or more port numbers can be configured. When one port number is configured, all IP addresses use the same port for connection. When multiple port numbers are configured, the sequence is the same as the IP address sequence, and the number of port numbers must be the same as the number of IP addresses. If they are different, an error is reported. This parameter is equivalent to port. |
See the gsql --port parameter. |
create |
Determines whether to create the pgxc_copy_error_log and gs_copy_summary tables. |
The value can be true or false. The default value is true. |
clean |
Specifies whether to clear the error record. |
The value can be true or false. The default value is false. |
data |
(Required) Data file. You can specify multiple data files or use wildcards (*) and question marks (?) to represent multiple data files. |
String |
control |
(Required) Name of a control file. |
String |
log |
Name of a log file. |
String |
bad |
Name of the file that records the error lines and details. You can also specify a directory. If you do not specify a directory, the file is generated based on the data file name. |
String |
discard |
Name of the file recording the lines that fail to be matched by WHEN. You can also specify a directory to generate the file name based on the data file name. |
String |
errors |
Maximum number of error lines in a data file. |
Integer Default value: 0 |
skip |
Number of first lines that can be skipped in a data file. |
Integer Default value: 0 |
bindsize |
Only syntax compatibility is implemented, but functions are not implemented. |
- |
rows |
Specifies the number of rows of data to be imported before a commit. |
The value is an integer in the range [1,2147483647]. |
- All parameters are in lowercase and are compatible with the gsql login mode, including -p port number, -h host, -d database, -U username, and -W password.
- When the rows parameter is specified, the number of commit times cannot exceed 1000. Otherwise, the performance is affected. The number of commit times is approximately equal to the number of data rows in the data file divided by the value of rows. If the rows parameter is not specified, there is no default value for rows. In this case, the transaction is committed only once after all data is imported to the table.
- When gs_loader sets the GUC parameter a_format_load_with_constraints_violation to support non-rollback upon constraint conflicts, if a table has a BEFORE/AFTER ROW INSERT trigger, a maximum of 10,000,000 rows can be committed at a time.
- gs_loader does not support statement-level triggers when the GUC parameter a_format_load_with_constraints_violation is set to support non-rollback upon constraint conflicts.
Control Files
- Syntax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
LOAD [ DATA ] [CHARACTERSET char_set_name] [INFILE [directory_path] [filename ] ] [BADFILE [directory_path] [filename ] ] [OPTIONS(name=value)] [{ INSERT | APPEND | REPLACE | TRUNCATE }] INTO TABLE table_name [{ INSERT | APPEND | REPLACE | TRUNCATE }] [FIELDS CSV] [TERMINATED [BY] { 'string' }] [OPTIONALLY ENCLOSED BY { 'string' }] [TRAILING NULLCOLS] [ WHEN { (start:end) | column_name } {= | !=} 'string' ] [( col_name [ [ POSITION ({ start:end }) ] ["sql_string"] ] | [ FILLER [column_type [external] ] ] | [ CONSTANT "string" ] | [ SEQUENCE ( { COUNT | MAX | integer } [, incr] ) ]|[NULLIF (COL=BLANKS)] [, ...] )]
- Parameter description:
- CHARACTERSET
Character set.
Value range: a string. Currently, the value can be 'AL32UTF8', 'zhs16gbk', or 'zhs32gb18030'.
The character set specified by CHARACTERSET in the control file must be the same as the encoding format of the file. Otherwise, an error is reported or garbled characters are displayed in the imported data.
- INFILE
The current keyword is invalid and needs to occupy a separate line in the control file. The keyword is ignored during running. You need to specify the corresponding data file in the gs_loader command line parameters.
- BADFILE
The current keyword is invalid and will be ignored during running. If no .bad file is specified in the gs_loader command, a .bad file will be generated based on the name of the corresponding control file.
- OPTIONS
Only the skip and rows parameters take effect. skip=n indicates that the first n records are skipped during import, and rows=n indicates the number of rows to be imported before a commit. If both the command line and control file are specified, the command line has a higher priority.
- INSERT | APPEND | REPLACE | TRUNCATE
Import mode.
INSERT: If the table contains data, an error is reported.
APPEND: Data is inserted directly.
REPLACE: If the table contains data, all data is deleted and then inserted.
TRUNCATE: If the table contains data, all data is deleted and then inserted.
- When writing a control file (.ctl), you can specify the import mode (INSERT | APPEND | REPLACE | TRUNCATE) before and after the INTO TABLE table_name statement. The priority is as follows: The import mode specified after the statement takes precedence over and overwrites that specified before the statement.
- When multiple gs_loader sessions are started to concurrently import data to the same table, you are advised to use the APPEND mode. If you use the INSERT, REPLACE, or TRUNCATE mode, an import error may occur or the imported data may be incomplete.
- FIELDS CSV
Specifies that the CSV mode of COPY is used. In CSV mode, the default separator is a comma (,), and the default quotation mark is a double quotation mark (").
In the current CSV mode, quoted line feeds are considered as part of the column data.
- table_name
Specifies the name (possibly schema-qualified) of an existing table.
Value range: an existing table name
- TERMINATED [BY] { 'string' }
The string that separates columns within each row (line) of the file, and it cannot be larger than 10 bytes.
Value range: The value cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789
Value range: The default value is a tab character in text format and a comma in CSV format.
- OPTIONALLY ENCLOSED BY { 'string' }
Specifies a quoted character string for a CSV file.
The default value is double quotation marks (") only in CSV mode that is explicitly specified by the FIELDS CSV parameter.
In other modes, there is no default value.
- When you set OPTIONALLY ENCLOSED BY { 'string' }, either there is no quotation mark on the left of the data, or the quotation marks on the left and right must be an odd number but do not have to be equal.
- Currently, OPTIONALLY ENCLOSED BY { 'string' } is supported only in CSV mode. If OPTIONALLY ENCLOSED BY { 'string' } is specified, the system enters the CSV mode by default.
- TRAILING NULLCOLS
Specifies how to handle the problem that multiple columns of a row in a source data file are lost during data import.
If one or more columns at the end of a row are null, the columns are imported to the table as null values. If this parameter is not set, an error message is displayed, indicating that the error column is null. In this case, the data in this row is processed as an error.
- WHEN { (start:end) | column_name } {= | !=}
Filters rows by character string between start and end or by column name.
Value range: a string.
- POSITION ({ start:end })
Processes columns and obtain the corresponding character strings between start and end.
- "sql_string"
Processes columns and calculates column values based on column expressions. For details, see • Column expression.
Value range: a string.
- FILLER
Processes columns. If FILLER occurs, this column is skipped.
- column_type [external]
Processes the imported data according to different data types. For details, see • Data types.
- CONSTANT
Processes columns and sets the inserted columns to constants.
Value range: a string.
- SEQUENCE ( { COUNT | MAX | integer } [, incr] )
Processes columns to generate the corresponding sequence values.
- COUNT: The count starts based on the number of rows in the table.
- MAX: The count starts from the maximum value of this column in the table.
- integer: The count starts from the specified value.
- incr: indicates the increment each time.
- NULLIF
Processes columns. In multi-row import scenarios, if sysdate, constant, position, or column expression is not specified after a column name, the column whose NULLIF keyword is not specified is left empty.
Currently, only the COL POSITION() CHAR NULLIF (COL=BLANKS) syntax is supported. For details, see • NULLIF use cases.
- OPTIONS, INFILE, and BADFILE are not supported. Syntax errors are not reported only in specific scenarios.
- gs_loader uses a .bad file to record errors coming from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.
- If a large number of constraint conflicts exist in the data file to be imported at a time, for example, the memory of the database server is small (for example, 32 GB) and the number of constraint conflicts exceeds 2 million rows, or the memory is greater than 128 GB and the number of constraint conflicts exceeds 10 million rows, a large amount of cache may be occupied. As a result, "ERROR: memory is temporarily unavailable" is reported and the import fails. Therefore, you are advised not to use the feature of non-rollback upon constraint conflicts.
- CHARACTERSET
- If the data in the .bad file is empty, refer to the source file and row number in the error table. (The code sequence is not identified, the .bad file content is not written, and only blank rows are recorded.)
loader=# select * from pgxc_copy_error_log; relname | begintime | filename | lineno | rawrecord | detail ----------------------+-------------------------------+----------+--------+-----------+------------------------------------------------- public.test_gsloader | 2023-02-09 09:20:33.646843-05 | STDIN | 1 | | invalid byte sequence for encoding "UTF8": 0xb4 (1 row) // In the preceding example, for the file corresponding to the loader, search for the first row of the data text to find the source data.
- NULLIF use cases
// Create a table. CREATE TABLE gsloader_test_nullif( col1 varchar2(100) not null enable, col2 number(5,0) not null enable, col3 varchar2(200) not null enable, col4 varchar2(34) not null enable, col5 varchar2(750), col6 number(20,0), col7 varchar2(4000), col8 varchar2(200) ); // Data file test.csv 6007 17060072021-09-0360070001102010000000230 1 600700010000218 0 1 1 229465 3 6007 17060072021-09-0360070001102010000000299 1 600700010000282 0 1 1 230467 3 6007 17060072021-09-0360070001102010000000242 1 600700010000255 0 1 1 226400 3 6007 17060072021-09-0360070001102010000000202 1 600700010000288 0 1 1 219107 3 6007 17060072021-09-0360070001102010000000294 1 600700010000243 0 1 1 204404 3 6007 17060072021-09-0360070001102010000000217 1 600700010000270 0 1 1 226644 3 // Control file test.ctl LOAD DATA CHARACTERSET UTF8 TRUNCATE INTO TABLE gsloader_test_nullif TRAILING NULLCOLS (COL1 POSITION(1:10) CHAR NULLIF (COL1 = BLANKS), COL2 POSITION(11:14) CHAR NULLIF (COL2 = BLANKS), COL3 POSITION(21:30) CHAR NULLIF (COL3 = BLANKS), COL4 POSITION(31:40) CHAR NULLIF (COL4 = BLANKS), COL5 sysdate, COL6, COL7, COL8 POSITION(71:80) CHAR NULLIF (COL8 = BLANKS)) // Import data. GS_LOADER -p xxx host=xxx control=test.ctl data=test.csv -d testdb -W xxx // Result: Imported. loader=# SELECT * FROM gsloader_test_nullif; col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 ------------+------+------------+------------+---------------------+------+------+----------- 6007 17060 | 720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 | | | 010000218 6007 17060 | 720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 | | | 010000282 6007 17060 | 720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 | | | 010000255 6007 17060 | 720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 | | | 010000288 6007 17060 | 720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 | | | 010000243 6007 17060 | 720 | 0360070001 | 1020100000 | 2023-05-17 20:32:03 | | | 010000270 (6 rows)
According to the data in the imported table, after the NULLIF keyword is used, the imported columns are normal except for the columns with the specified NULLIF and sysdate calculations. The imported columns without specified calculations are empty.
- Column expression
gs_loader supports expression conversion and scenario extension for specified columns.
({ column_name [ data_type ] [ AS transform_expr ] } [, ...])
data_type specifies the data type of the column in the expression parameter. transform_expr specifies the target expression and returns the result value whose data type is the same as that of the target column in the table.
Example:
- The column type is not specified in the .ctl file, and the source data does not meet the column restrictions (data type and length restrictions) in the table.
// Create a table. create table t_test(id int, text varchar(5)); // Data file test.csv addf2,bbbbaaa,20220907, // Control file test.ctl Load Data TRUNCATE INTO TABLE t_test fields terminated by ',' TRAILING NULLCOLS( id "length(trim(:id))", text "replace(trim(:text),'bbbb','aa')" ) // guc_param file set a_format_copy_version='s1'; // Import data. gs_loader -p xxx host=xxx control=test.ctl data=test.csv -d testdb -W xxx guc_param=test_guc.txt // Result: Imported. select * from t_test; id | text ----+------- 5 | aaaaa (1 row)
- The column type is not specified in the .ctl file, and the implicit type conversion is performed. (You are advised to add compatibility parameters because the implicit type conversion is involved.)
// Create a table. create table test(mes int, mes1 text, mes2 float8, mes3 timestamp with time zone, mes4 INTEGER); // Data file cat load_support_transform.data 1,mmoo,12.6789,Thu Jan 01 15:04:28 1970 PST,32767 2,yyds,180.883,Thu Jun 21 19:00:00 2012 PDT,32768 // Control file cat load_support_transform.ctl Load Data TRUNCATE INTO TABLE test fields terminated by ',' TRAILING NULLCOLS( mes, mes1 "mes1 || mes2", mes2 "mes2 + 1", mes3 "date_trunc('year', mes3)", mes4 ) // guc_param file cat test_guc.txt set a_format_copy_version='s1'; set a_format_dev_version='s2'; set a_format_version='10c'; // Import data. gs_loader -p xxx host=xxx control=load_support_transform.ctl data=load_support_transform.data -d testdb -W xxx guc_param=test_guc.txt // Result: Imported. select * from test; mes | mes1 | mes2 | mes3 | mes4 -----+-------------+---------+------------------------+------- 1 | mmoo12.6789 | 13.6789 | 1970-01-01 00:00:00+08 | 32767 2 | yyds180.883 | 181.883 | 2012-01-01 00:00:00+08 | 32768
- The column type is not specified in the .ctl file, and the source data does not meet the column restrictions (data type and length restrictions) in the table.
- Data types
Correspond to column_type [external] in the control file. During data loading, data is processed based on the data type. gs_loader classifies data types into common and special data types.
- Common data types
- CHAR [(length)]:
Reads data based on a column separator and converts the value to the CHAR type. length indicates the maximum length of a single piece of data, in bytes. Generally, one character occupies one byte. The value can be left blank. The scenarios are as follows:
- If a length is not declared, the value inherits the maximum length value declared by POSITION.
- If a length is declared, it overwrites the maximum length declared by POSITION.
- If neither length nor POSITION declares a length, the value is set based on the length between separators.
- The priority of the length declaration is as follows: length > POSITION > separator.
- If none of length, POSITION, and separator declares a length, the default length is 1.
- If the actual data length exceeds the maximum value declared by length, an error is reported.
- INTEGER external [(length)]:
Reads data based on a column separator and converts the value to the INTEGER type. The rules for using length are the same as those described in "CHAR [(length)]."
- FLOAT external [(length)]:
Reads data based on a column separator and converts the value to the FLOAT type. The rules for using length are the same as those described in "CHAR [(length)]."
- DECIMAL external (length):
Reads data based on a column separator and converts the value to the DECIMAL type. The rules for using length are the same as those described in "CHAR [(length)]."
- TIMESTAMP:
Reads data based on a column separator and converts the value to the TIMESTAMP type.
- DATE:
Reads data based on a column separator and converts the value to the DATE type.
- DATE external:
Reads data based on a column separator and converts the value to the DATE type.
- SYSDATE:
Obtains the system time when the corresponding insertion is performed in the database. The value cannot be referenced. The referenced content is the SYSDATE character string.
- CHAR [(length)]:
- Special data types
- INTEGER:
Ignores the column separator, reads four-byte characters, saves them based on the little-endian storage logic, parses each character into a hexadecimal ASCII code value, and converts the value into a decimal number.
- SMALLINT:
Ignores the column separator, reads two-byte characters, saves them based on the little-endian storage logic, parses each character into a hexadecimal ASCII code value, and converts the value into a decimal number.
Example:
// Create a table. create table t_spec(col1 varchar(10), col2 varchar(10)); // Data file cat t_spec.txt 1234,5678, // Control file cat t_spec.ctl Load Data TRUNCATE INTO TABLE t_spec fields terminated by ',' TRAILING NULLCOLS( col1 position(2:6) integer, col2 position(5:8) smallint ) // guc_param file cat test_guc.txt set a_format_copy_version='s1'; set a_format_dev_version='s2'; set a_format_version='10c'; // Import data. gs_loader -p xxx host=xxx control=t_spec.ctl data=t_spec.txt -d testdb -W xxx guc_param=test_guc.txt // Result: Imported. select * from t_spec; col1 | col2 -----------+------- 741618482 | 13612 (1 row)
- RAW:
Parses each character into an ASCII code value. The backslash (\) is not used as an escape character.
Restriction: Separators cannot be used in RAW data.
Example:
// Create a table. create table t_raw(col raw(50)); // Data file cat t_raw.txt 12\n\x78!<~?'k^(%s)>/c[$50] // Control file cat t_raw.ctl Load Data TRUNCATE INTO TABLE t_raw TRAILING NULLCOLS( col position(1:50) raw ) // guc_param file cat test_guc.txt set a_format_copy_version='s1'; set a_format_dev_version='s2'; set a_format_version='10c'; // Import data. gs_loader -p xxx host=xxx control=t_raw.ctl data=t_raw.txt -d testdb -W xxx guc_param=test_guc.txt // Result: Imported. select * from t_raw; col -------------------------------------------------------- 31325C6E5C783738213C7E3F276B5E282573293E2F635B2435305D (1 row)
- In the multi-column import scenario, if the GUC parameter is not specified, some positions and separators cannot be used at the same time.
- In the multi-column import scenario, if the GUC parameter is specified, the POSITION operation cannot be used for some columns.
- In the multi-column import scenario, if common data types and special data types are used together, you need to specify POSITION for all data types.
- When importing data of a specified data type, you need to use guc_param to set a_format_copy_version for common data types and use guc_param to set a_format_copy_version, a_format_dev_version and a_format_version for special data types.
- If a column expression involves a system function, you need to use guc_param to set a_format_dev_version and a_format_version based on the corresponding function.
- If the data type contains length, the value of length must be set to an integer greater than 0. The special data type RAW(length) is used differently from common types. For example, if POSITION is not specified for the common type INTEGER EXTERNAL(length), an error is reported when the value of length is less than the data length of the corresponding column in a text file (such as .csv or .txt). If the value of length is greater than the data length of the corresponding column in a text file (such as .txt), the result of the INTEGER EXTERNAL type is output. If POSITION is not specified for the special data type RAW(length), the first length characters are read.
- If POSITION(start:end) is specified, the value of start must be set to an integer greater than 0, and the value of end must be greater than or equal to the value of start.
- During concurrent import, if multiple names of files specified by discard or bad point to files with the same name in the same directory, gs_loader stops importing the next file and reports an error. If a previous file has been imported, the file will be overwritten.
The following error is reported:
ERROR: An error occurred. Please check logfile.
In the log file:
…lock failed: Resource temporarily unavailable…
- If the column value in the control file is not empty and the column content is not used, the location of the data file is not occupied.
For example, the control file is as follows:
Load Data TRUNCATE INTO TABLE gsloader fields terminated by ',' TRAILING NULLCOLS( id "trim(:id)", text "to_char(SYSDATE,'yyyymmdd')", gmt_create "trim(:gmt_create)", create_str "trim(:create_str)" )
The data file is as follows:
11,HELLO,33,
The import result is as follows:
loader=# select * from gsloader; id | text | gmt_create | create_str ----+--------+---------------------+------------ 11 | 2023-02-08 16:00:54 | HELLO | 33
- INTEGER:
- Common data types
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot