Updated on 2024-06-07 GMT+08:00

gs_loader

Overview

  • gs_loader is used to import data. gs_loader converts the syntax supported by the control file to the \COPY syntax, uses the existing \COPY function to import data, and records the \COPY result in logs.
  • Before using gs_loader, ensure that the gs_loader version is consistent with the gsql version and database version.

Log Level Configuration

Set the log level for developers to view. After the setting, the tool running information is printed on the console.

export gs_loader_log_level=debug
export gs_loader_log_level=info
export gs_loader_log_level=warning
export gs_loader_log_level=error

Permission

The application scenarios are classified into separation-of-duties and non-separation-of-duties scenarios. You can set enableSeparationOfDuty to on or off to enable or disable the separation of duties function.

  • If enableSeparationOfDuty is set to off:

    The user can be a common user or the database administrator. If the user is a common user, the administrator needs to grant permissions to the common user. The administrator account can be used directly.

    Create a user.

    CREATE USER load_user WITH PASSWORD '************';

    Create related tables and grant permissions.

    GRANT ALL ON FUNCTION copy_error_log_create() TO load_user;
    GRANT ALL ON SCHEMA public TO load_user;
    SELECT copy_error_log_create();
    SELECT copy_summary_create();
    GRANT ALL PRIVILEGES ON  public.pgxc_copy_error_log To load_user;
    GRANT ALL PRIVILEGES ON  public.gs_copy_summary To load_user;
  • If enableSeparationOfDuty is set to on:

    The user can be a common user or the database administrator. Create the pgxc_copy_error_log and gs_copy_summary tables in their respective schemas and add indexes. No permission granting is required.

    Create a user.

    CREATE USER load_user WITH PASSWORD '********';

    Create related tables and add indexes.

    CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text);
    CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname);
    CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text);
    CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);

Usage Environment

You need to add the tool path to PATH. gs_loader supports SSL encrypted communication. The method of using gs_loader is the same as that of using gsql. For details, refer to the Administrator Guide.

Adding System Catalogs

The gs_copy_summary table is added to record the COPY execution result summary, including the number of successful rows, number of error rows, number of ignored rows, and number of empty rows.

The copy_summary_create function is added to create the gs_copy_summary table.

The format of the gs_copy_summary table is as follows:

relname     | public.sqlldr_tbl
begintime   | 2021-09-03 16:00:11.7129-04
endtime     | 2021-09-03 16:00:15.259908-04
id          | 21870
pid         | 47582725060352
readrows    | 100000
skiprows    | 0
loadrows    | 111
errorrows   | 0
whenrows    | 99889
allnullrows | 0
detail      | 111 Rows successfully loaded.
            | 0 Rows not loaded due to data errors.
            | 99889 Rows not loaded because all WHEN clauses were failed.
            | 0 Rows not loaded because all fields were null.
            | 

Usage Guidelines

  1. Create a user and related tables, and add indexes.

    • If the separation of duties function is disabled for common users only:

      Create a user.

      CREATE USER load_user WITH PASSWORD '************';

      Create related tables and grant permissions.

      GRANT ALL ON FUNCTION copy_error_log_create() TO load_user;
      GRANT ALL ON SCHEMA public TO load_user;
      SELECT copy_error_log_create();
      SELECT copy_summary_create();
      GRANT ALL PRIVILEGES ON  public.pgxc_copy_error_log To load_user;
      GRANT ALL PRIVILEGES ON  public.gs_copy_summary To load_user;
    • If the separation of duties function is enabled for common users and administrators:

      Create a user.

      CREATE USER load_user WITH PASSWORD '********';

      Create related tables and add indexes.

      CREATE TABLE load_user.pgxc_copy_error_log (relname varchar, begintime timestamptz, filename varchar, lineno int8, rawrecord text, detail text);
      CREATE INDEX copy_error_log_relname_idx ON load_user.pgxc_copy_error_log(relname);
      CREATE TABLE load_user.gs_copy_summary(relname varchar, begintime timestamptz, endtime timestamptz, id bigint, pid bigint, readrows bigint, skiprows bigint, loadrows bigint, errorrows bigint, whenrows bigint, allnullrows bigint, detail text);
      CREATE INDEX gs_copy_summary_idx ON load_user.gs_copy_summary(id);

  2. Create a table and a control file, and prepare a data file.

    Create the loader_tbl table.

    1
    2
    3
    4
    5
    6
    7
    CREATE TABLE  loader_tbl
    (
        ID   NUMBER,
        NAME VARCHAR2(20),
        CON  VARCHAR2(20),
        DT   DATE
    );
    

    Create the loader.ctl control file.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    LOAD DATA
    truncate into table loader_tbl
    WHEN (2:2) = ',' 
    fields terminated by ','
    trailing nullcols
    (
        id integer external,
        name char(32),
        con ":id || '-' || :name",
        dt date
    )
    

    Create the data.csv data file.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    1,OK,,2007-07-8
    2,OK,,2008-07-8
    3,OK,,2009-07-8
    4,OK,,2007-07-8
    43,DISCARD,,2007-07-8
    ,,,
    32,DISCARD,,2007-07-8
    a,ERROR int,,2007-07-8
    8,ERROR date,,2007-37-8
    ,,,,
     ,
    8,ERROR fields,,2007-37-8
    ,,,
    5,OK,,2021-07-30
    

  3. Import the data.

    Before importing data, ensure that the gs_loader tool has the execute permission. Ensure that the current path has the write permission on files. (The gs_loader generates some temporary files during the processing and automatically deletes them after the import is complete.)

    1
    gs_loader control=loader.ctl data=data.csv db=testdb bad=loader.bad errors=5 port=8000 passwd=************ user=load_user
    

    Execution result:

    1
    2
    3
    4
    5
    6
    gs_loader: version 0.1
     
     10 Rows successfully loaded. 
     
    log file is: 
     loader.log
    

Parameter Description

Table 1 gs_loader parameter description

Parameter

Parameter Description

Parameter Type: Value Range

help

Displays help information.

-

user

Database connection user (equivalent to -U).

String

-U

Database connection user (equivalent to user).

String

passwd

User password (equivalent to -W).

String

-W

User password (equivalent to passwd).

String

db

Database name. This parameter is mandatory and is equivalent to -d.

String

-d

Database name. This parameter is mandatory and is equivalent to db.

String

host

Specifies the host name of the machine on which the server is running or the directory for the Unix-domain socket (equivalent to -h).

See the gsql --host parameter.

-h

Specifies the host name of the machine on which the server is running or the directory for the Unix-domain socket (equivalent to host).

See the gsql --host parameter.

port

Port number of the database server (equivalent to -p).

See the gsql --port parameter.

-p

Port number of the database server (equivalent to port).

See the gsql --port reference.

create

Determines whether to create the pgxc_copy_error_log and gs_copy_summary tables.

The value can be true or false. The default value is true.

clean

Indicates whether to clear the error record.

The value can be true or false. The default value is false.

data

(Mandatory) Data file. You can specify multiple data files or use wildcards (*) and question marks (?) to represent multiple data files.

String

control

(Mandatory) Name of a control file.

String

log

Name of a log file.

String

bad

Name of the file recording the error lines. You can also specify a directory to generate the file based on the data file name.

String

discard

Name of the file recording the lines that fail to be matched by WHEN. You can also specify a directory to generate the file name based on the data file name.

String

errors

Maximum number of error lines in a data file.

Integer Default value: 0

skip

Number of first lines that can be skipped in a data file.

Integer Default value: 0

bindsize

Only syntax compatibility is implemented, but functions are not implemented.

-

rows

Only syntax compatibility is implemented, but functions are not implemented.

-

  • All parameters are in lowercase and are compatible with the gsql login mode, including -p port number, -h host, -d database, -U username, and -W password.
  • gs_loader uses a .bad file to record errors from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.

Control Files

  • Syntax
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    LOAD [ DATA ]
    [CHARACTERSET char_set_name]
    [INFILE [directory_path] [filename ] ]
    [BADFILE [directory_path] [filename ] ]
    [{ INSERT | APPEND | REPLACE | TRUNCATE }] 
    INTO TABLE table_name
    [{ INSERT | APPEND | REPLACE | TRUNCATE }] 
    [FIELDS CSV]
    [TERMINATED [BY] { 'string' }]
    [OPTIONALLY ENCLOSED BY { 'string' }]
    [TRAILING NULLCOLS]
    [ WHEN { (start:end) | column_name } {= | !=} 'string' ]
    [(
    col_name [ [ POSITION ({ start:end }) ]  ["sql_string"] ] | [ FILLER [column_type [external] ] ] | [ CONSTANT "string" ] | [ SEQUENCE ( { COUNT | MAX | integer } [, incr] ) ]|[NULLIF (COL=BLANKS)]
    [, ...]
    )]
    
  • Parameter description:
    • CHARACTERSET

      Character set.

      Value range: a string.

    • INFILE

      The current keyword is invalid and needs to occupy a separate line in the control file. The keyword is ignored during running. You need to specify the corresponding data file in the gs_loader command line parameters.

    • BADFILE

      The current keyword is invalid and will be ignored during running. If badfile is not specified in the gs_loader command, a badfile will be generated based on the name of the corresponding control file.

    • INSERT | APPEND | REPLACE | TRUNCATE

      Import mode.

      INSERT: If the table contains data, an error is reported.

      APPEND: Data is inserted directly.

      REPLACE: If the table contains data, all data is deleted and then inserted.

      TRUNCATE: If the table contains data, all data is deleted and then inserted.

    • table_name

      Specifies the name (possibly schema-qualified) of an existing table.

      Value range: an existing table name

    • FIELDS csv

      Specifies that the CSV mode of COPY is used. In CSV mode, the default separator is a comma (,), and the default quotation mark is a double quotation mark (").

    • TERMINATED [BY] { 'string' }

      The string that separates columns within each row (line) of the file, and it cannot be larger than 10 bytes.

      Value range: The delimiter cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789

      Value range: The default value is a tab character in text format and a comma in CSV format.

    • OPTIONALLY ENCLOSED BY { 'string' }

      Specifies a quoted character string for a CSV file.

      The default value is double quotation marks (") only in CSV mode that is explicitly specified by the FIELDS CSV parameter.

      In other modes, there is no default value.

    • TRAILING NULLCOLS

      Specifies how to handle the problem that multiple columns of a row in a source data file are lost during data import.

    • WHEN { (start:end) | column_name } {= | !=}

      Filters rows by character string between start and end or by column name.

      Value range: a string.

    • POSITION ({ start:end })

      Processes columns and obtain the corresponding character strings between start and end.

    • "sql_string"

      Processes columns and calculates column values based on column expressions.

      Value range: a string.

    • FILLER

      Processes columns. If FILLER occurs, this column is skipped.

    • CONSTANT

      Processes columns and sets the inserted columns to constants.

      Value range: a string.

    • SEQUENCE ( { COUNT | MAX | integer } [, incr] )

      Processes columns to generate the corresponding sequence values.

      • COUNT: The count starts based on the number of rows in the table.
      • MAX: The count starts from the maximum value of this column in the table.
      • integer: The count starts from the specified value.
      • incr: indicates the increment each time.
    • NULLIF

      Leave the field empty. Currently, only the COL POSITION() CHAR NULLIF (COL=BLANKS) syntax is supported.

    • OPTIONS, INFILE, and BADFILE are not supported. Syntax errors are not reported only in specific scenarios.
    • gs_loader uses a .bad file to record errors from the rawrecord column in an error table. The error table does not record rawrecord if an error cannot be read by certain code. In this case, a blank line is recorded in the .bad file.