Help Center/ GaussDB/ Developer Guide(Distributed_V2.0-8.x)/ SQL Reference/ SQL Syntax/ C/ COPY

Updated on 2025-09-22 GMT+08:00

View PDF

COPY

Description

Copies data between tables and files.

COPY FROM copies data from a file to a table, and COPY TO copies data from a table to a file.

Precautions

When the enable_copy_server_files parameter is disabled, only the initial user is allowed to run the COPY FROM FILENAME or COPY TO FILENAME command. When the enable_copy_server_files parameter is enabled, users with the SYSADMIN permission or users who inherit the permissions of built-in role gs_role_copy_files are allowed to run the COPY FROM FILENAME or COPY TO FILENAME command. By default, you cannot run the COPY FROM FILENAME or COPY TO FILENAME command for database configuration files, key files, certificate files, and audit logs to prevent unauthorized users from viewing or modifying sensitive files. When enable_copy_server_files is enabled, the administrator can use the GUC parameter safe_data_path to set the path for common users to import and export to the subpath of the set path. If this GUC parameter is not set (by default), the path used by common users is not blocked. This parameter reports an error for the relative path in the path of the COPY statement.
COPY applies only to tables but not views.
COPY TO requires the SELECT permission on the table to be read, and COPY FROM requires the INSERT permission on the table to be inserted.
If a list of columns is specified, COPY copies only the data of the specified columns between the file and the table. If a table has any columns that are not in the column list, COPY FROM inserts default values for those columns.
If a data source file is specified, the server must be able to access the file. If STDIN is specified, data flows between the client and the server. When entering data, use the TAB key to separate the columns of the table and use a backslash and a period (\.) in a new row to indicate the end of the input.
COPY FROM throws an error if any row in the data file contains more or fewer columns than expected.
The end of the data can be represented by a line that contains only backslashes and periods (\.). If data is read from a file, the end flag is unnecessary. If data is copied between client applications, an end tag must be provided.
In COPY FROM, \N is an empty string. To enter the actual value \N, use \\N.

COPY FROM can preprocess data using column expressions, but column expressions do not support subqueries.
When a data format error occurs during COPY FROM execution, the transaction is rolled back. However, the error information is insufficient, making it difficult to locate the error data from a large amount of raw data.
COPY FROM and COPY TO apply to low concurrency and local import and export of a small amount of data.
When COPY is used in binary format, transcoding in distributed mode is not supported.
COPY is a server command and its operating environment is the same as that of the database server process. \COPY is a client meta-command and its execution environment is the same as that of gsql on the client. Note that when the database and gsql are used in the sandbox environment, the COPY and \COPY commands both use the paths in the sandbox. When the database is used in the sandbox environment and gsql is used outside the sandbox, the COPY command uses the path inside the sandbox, and the \COPY command uses the path outside the sandbox.
When executing COPY to import data to a base table with a GSI, the enable_stream_operator parameter must be enabled to achieve optimal data import performance.
During the export using COPY TO, if the GUC parameter support_zero_character is disabled and the column data in the table contains the '\0' characters, the column data will be truncated during the export. Only the data before '\0' will be exported. When the parameter is enabled, the '\0' characters are also exported. The performance loss is positively correlated with the number of '\0' characters.
After the schema of the pgxc_copy_error_log table is changed from public to pg_catalog, the table in the original public schema is deprecated. A table in the public schema may have the same name as a user table. Therefore, you need to identify the table. If the table is not a user table, migrate data to the corresponding table in the pg_catalog schema and delete the table in the original public schema. If a table with the same name exists in the pg_catalog schema, the GSQL meta-command \d(+) preferentially matches pg_catalog.pgxc_copy_error_log, which is not displayed because it is a system catalog. As a result, pgxc_copy_error_log cannot be found in the list, but this table exists in the public schema. You can directly reference public.pgxc_copy_error_log.

When exporting data of the float4 or float8 type, you are advised to set extra_float_digits to 3 to avoid loss of significant digits and ensure consistency between the data before and after import and export.
In some scenarios, you can set the GUC parameter batch_insert_index_types to "rcr_ubtree" to improve the performance of inserting indexes in batches.
For security purposes, do not use COPY FROM to import data to the pg_authid, pg_auth_history, gs_global_config, and gs_workload_rule system catalogs.
When the parameter enable_security_policy is enabled, and if a masking policy has been set, the system will perform data masking.
When importing and exporting data, check whether the data contains escape characters. If you prepare the data file, you need to assess the import mode (import with or without escaping). If the data file is exported using the COPY command, the escape option set for import must be the same as that for export.
When COPY is used to import data, the size of a single-line data tuple cannot exceed 1 GB – 1 byte in non-transcoding scenarios, and cannot exceed 256 MB – 1 byte in transcoding scenarios.

Syntax

Copy data from a file to a table.

COPY [BINARY] table_name [ ( column_name [, ...] ) ] 
    [ WITH OIDS ]
    FROM { 'filename' | STDIN }
    [ LOAD ]
    [ LOAD_DISCARD 'discard_file_name' ]
    [ LOAD_BAD 'bad_file_name' ]
    [ USEEOF ]
    [ [ USING ] DELIMITERS 'delimiters' ]
    [ WITHOUT ESCAPING ]
    [ LOG ERRORS | LOG ERRORS DATA ]
    [ REJECT LIMIT 'limit' ]
    [ WITH ]
    [ copy_option [ ...] | ( option [, ...] ) ];

Copy data from a table to a file.

COPY table_name [ ( column_name [, ...] ) ]
    [ WITH OIDS ]
    TO { 'filename' | STDOUT }
    [ [ USING ] DELIMITERS 'delimiters' ]
    [ WITHOUT ESCAPING ]
    [ WITH ]
    [ copy_option [ ...] | ( option [, ...] ) ];

COPY query {(SELECT) | (VALUES)}
    TO { 'filename' | STDOUT }
    [ WITHOUT ESCAPING ]
    [ WITH ]
    [ copy_option [ ...] | ( option [, ...] ) ];

The syntax constraints of COPY TO are as follows:
(query) is incompatible with [USING] DELIMITERS. If the data comes from a query result, COPY TO cannot specify [USING] DELIMITERS.
copy_option is the native parameter, while option is the parameter imported by a compatible foreign table.

The optional parameters of the copy_option clause are as follows:

OIDS
| DELIMITER [ AS ] 'delimiter_string'
| NULL [ AS ] 'null_string'
| HEADER
| FILEHEADER 'header_file_string'
| FREEZE 
| FORCE NOT NULL column_name [, ...]
| FORCE QUOTE { column_name [, ...] | * }
| BINARY
| CSV
| FIXED
| QUOTE [ AS ] 'quote_character'
| ESCAPE [ AS ] 'escape_character'
| EOL 'newline_character'
| ENCODING 'encoding_name'
| IGNORE_EXTRA_DATA
| FILL_MISSING_FIELDS
| COMPATIBLE_ILLEGAL_CHARS
| DATE_FORMAT 'date_format_string'
| TIME_FORMAT 'time_format_string'
| TIMESTAMP_FORMAT 'timestamp_format_string'
| DATEA_FORMAT 'datea_format_string'
| SMALLDATETIME_FORMAT 'smalldatetime_format_string'
| COPY_CUSTOM_ID 'custom_id_string'
| TABLE_COMPRESS_CLAUSE( ROW STORE COMPRESS ADVANCED [ MEDIUM | HIGH ] ROW [ON (EXPR)])
| FORMATTER ( [ column_name( offset, length ) ] [, ...] )
| TRANSFORM ( [ column_name [ data_type ] [ AS transform_expr ] ] [, ...] )

The optional parameters of the option clause are as follows:

FORMAT 'format_name'
| OIDS [ boolean ]
| DELIMITER 'delimiter_character'
| NULL 'null_string'
| HEADER [ boolean ]
| USEEOF [ boolean ]
| FILEHEADER 'header_file_string'
| FREEZE [ boolean ]
| QUOTE 'quote_character'
| ESCAPE 'escape_character'
| EOL 'newline_character'
| NOESCAPING [ boolean ]
| FORCE_QUOTE { ( column_name [, ...] ) | * }
| FORCE_NOT_NULL ( column_name [, ...] )
| ENCODING 'encoding_name'
| IGNORE_EXTRA_DATA [ boolean ]
| FILL_MISSING_FIELDS
| COMPATIBLE_ILLEGAL_CHARS [ boolean ]
| DATE_FORMAT 'date_format_string'
| TIME_FORMAT 'time_format_string'
| TIMESTAMP_FORMAT 'timestamp_format_string'
| DATEA_FORMAT 'datea_format_string'
| SMALLDATETIME_FORMAT 'smalldatetime_format_string'
| COPY_CUSTOM_ID 'custom_id_string'

Parameters

query
Specifies that the results will be copied.

Value range: Only one SELECT or VALUES command is supported. A semicolon (;) is not required at the end of the command.
table_name
Specifies the name (possibly schema-qualified) of an existing table.

Value range: an existing table name.
column_name
Specifies an optional list of columns to be copied.

Value range: any columns. All columns will be copied if no column list is specified.
STDIN
Specifies that input comes from the standard input. In the input, table columns are separated by tabs and each row ends with a backslash and a period (\.).
STDOUT
Specifies that output goes to the standard output.
USEEOF
The system does not report an error for "\." in the imported data.

Value range: true/on or false/off.

Default value: false
[USING] DELIMITERS 'delimiters'
String that separates columns within each row (line) of the file. It cannot be larger than 10 bytes.

Value range:
- The delimiter in text format cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789, but has no restriction for the CSV format.
- The delimiter cannot be set to '\r', '\n', or a user-defined newline character.
- A delimiter cannot contain the null string (refer to NULL null_string), nor can it be contained within the null string.
- A delimiter cannot contain \0.
- When COMPATIBLE_ILLEGAL_CHARS is specified (see COMPATIBLE_ILLEGAL_CHAR...), a delimiter cannot be set to a space or a question mark (?). Otherwise, the delimiter may conflict with the characters after error tolerance conversion.
- In CSV mode, a delimiter cannot contain QUOTE characters. For details, see QUOTE [AS] 'quote_chara....
- The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.
Value range: The default value is a tab character in text format and a comma in CSV format.
- Both DELIMITER and DELIMITERS can specify delimiters. DELIMITERS can be followed by brackets, but DELIMITER cannot be directly followed by brackets. Otherwise, an error is reported.
- Delimiters can be specified only in TEXT or CSV mode.
- The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
- You are advised to use multi-character delimiters or invisible delimiters. For example, you can use multi-characters (such as $^&) and invisible characters (such as 0x07, 0x08, and 0x1b).
- If a column to be exported in text format contains data that is the same as separators, a backslash (\) is added as an escape character before a conflicting data record, ensuring that the exported data can be properly identified.
WITHOUT ESCAPING
Specifies, in the TEXT format, whether to escape the backslash (\) and its following characters.

Value range: text only
For options related to escape characters, pay attention to the following:
- If you prepare a data file, you need to choose the desired import behavior (import with or without escaping).
- If a data file is exported using the kernel COPY syntax, the escape options must be the same for both export and import.
LOG ERRORS
If this parameter is specified, the error tolerance mechanism for data type errors in the COPY FROM statement is enabled.

Value range: a value set while data is imported using COPY FROM.
The restrictions of this error tolerance parameter are as follows:
- This error tolerance mechanism captures only the data type errors (DATA_EXCEPTION) that occur during data parsing of COPY FROM on the primary node of the database.
- If existing error tolerance parameters (for example, IGNORE_EXTRA_DATA) of the COPY statement are enabled, the error of the corresponding type will be processed as specified by the parameters and no error will be reported. Therefore, the error table does not contain such error data.
- This default option is that error tolerance does not support constraint conflicts. To make constraint conflicts tolerated, set the session-level GUC parameter a_format_load_with_constraints_violation to "s2" and import the file again.
  - The conflicts of the NOT NULL constraints, conditional constraints, PRIMARY KEY constraints, UNIQUE constraints, and unique index constraints can be tolerated.
  - This function is valid only in centralized A-compatible mode.
  - A statement-level trigger cannot handle any constraint conflict above, so the attempt to import data into a table with such trigger will fail with an error reported.
  - Under this feature, data will be inserted into partitioned tables row by row instead of in batches, which deteriorates the import performance.
  - Under this feature, the UB-tree indexes will be built row by row instead of in batches, degrading the index building performance.
  - This feature is still valid even if a constraint conflict is triggered by an operation on a table with a row-level trigger. Constraint conflicts of row-level triggers are implemented in sub-transactions, which use more memory resources and increase execution time. Therefore, you are advised to use this feature when constraint conflicts are very likely to occur. In this scenario, the amount of data to be imported at a time by using COPY should be less than or equal to 1 GB.
- During a rolling upgrade from an earlier version to this version, do not use the COPY error tolerance import capability before all nodes are upgraded.
LOG ERRORS DATA
The differences between LOG ERRORS DATA and LOG ERRORS are as follows:
1. LOG ERRORS DATA fills the rawrecord column in the error tolerance table.
2. Only users with the super permission can use the parameter options of LOG ERRORS DATA.
  - If error content is too complex, it may fail to be written to the error tolerance table by using LOG ERRORS DATA, causing the task failure.
  - For errors that cannot be read in certain code, the error codes are ERRCODE_CHARACTER_NOT_IN_REPERTOIRE and ERRCODE_UNTRANSLATABLE_CHARACTER. The rawrecord column is not recorded.
  - Files in BINARY format are not supported.
  - During a rolling upgrade from an earlier version to this version, do not use the COPY error tolerance import capability before all nodes are upgraded.
REJECT LIMIT 'limit'
Used with the LOG ERRORS option to set the upper limit of the tolerated errors in the COPY FROM statement. If the number of errors exceeds the limit, any further errors will be reported based on the original mechanism.

Value range: a positive integer (1 to INTMAX) or 'unlimited'

Default value: If LOG ERRORS is not specified, an error will be reported. If LOG ERRORS is specified, the default value is 0.

In the error tolerance mechanism described in the description of LOG ERRORS, the count of REJECT LIMIT is calculated based on the number of data parsing errors on the primary database node where the COPY FROM statement is executed, not based on the number of all errors on the node.

copy_option

Specifies all types of native parameters of COPY.

OIDS
Specifies whether to import and export OID hidden columns for tables (usually system catalogs) that contain OIDs.

Value range: true/on or false/off.

Default value: false
- If the table to be exported does not have an OID, an error is reported.
- The exported OID defaults to the first column. If the HEADER option is specified, the exported HEADER row does not include the OID column description.
DELIMITER [ AS ] 'delimiter_string'
Specifies the column delimiter of data in a data file line. The value range is the same as that of [USING] DELIMITERS 'deli....
NULL [ AS ] 'null_string'
Specifies the string that represents a null value.

Value range:
- The NULL value cannot contain '\r', '\n', or user-defined newline characters.
- The NULL value contains a maximum of 100 characters.
- NULL cannot contain delimiters (see [USING] DELIMITERS 'deli...) or be contained by delimiters.
- In CSV mode, NULL cannot contain QUOTE characters. For details, see QUOTE [AS] 'quote_chara....
- When COMPATIBLE_ILLEGAL_CHARS is specified (see COMPATIBLE_ILLEGAL_CHAR...), NULL cannot be set to a space or a question mark (?). Otherwise, the delimiter may conflict with the characters after error tolerance conversion.
- The NULL value cannot contain the '\0' character.
Default value:
- The default value for the CSV format is an empty string without quotation marks.
- The default value for the TEXT format is \N.
- The NULL parameter can be specified only in TEXT or CSV mode.
- When using COPY FROM, any data item that matches this string will be stored as a null value, so make sure that you use the same string as you used with COPY TO.
HEADER
Specifies whether the exported data file contains a header row and whether the first row in the data stream is the header row during import. A header row describes information about each column.
- This parameter can be specified only in CSV or FIXED mode. The content in the header row of the FIXED file must meet the column length definition.
- When this option is specified, the first row of the data text is identified as a header row during import and will be ignored. The first row in the data file exported is the header row.
- If this option is not specified, the first row of the data text is identified as data and imported. The exported data file does not contain a header row.
FILEHEADER 'header_file_string'
Specifies a file that defines the content in the header for exported data. The file contains data description of each column.
- This option is valid only when the header function is enabled.
- fileheader specifies an absolute path.
- The file can contain only one row of header information, and ends with a newline character. Excess rows will be discarded. (Header information cannot contain newline characters.)
- The length of the file including the newline character cannot exceed 1 MB.
FREEZE
Sets the COPY loaded data row as frozen, like these data have executed VACUUM FREEZE.

This is a performance option of initial data loading. The data will be frozen only when the following three requirements are met:
- The table being loaded has been created or truncated in the same transaction before copying.
- There are no cursors open in the current transaction.
- There are no original snapshots in the current transaction.
When COPY is completed, all the other sessions will see the data immediately. However, this violates the general principle of MVCC visibility, and users should understand that this may cause potential risks.
FORCE NOT NULL column_name [, ...]
Assigns a value to a specified column in CSV COPY FROM mode. If the column is null, its value is regarded as a string of 0 characters.

Value range: name of an existing column.
FORCE QUOTE { column_name [, ...] | * }
In CSV COPY TO mode, forces all non-NULL values to be surrounded by the QUOTE characters (see QUOTE [AS] 'quote_chara...) around each specified column. The asterisk (*) indicates all columns. NULL values are not enclosed in quote characters.

Value range: name of an existing column.

BINARY

Specifies that data is stored and read in binary mode instead of text mode.

In binary mode, you cannot declare DELIMITER, NULL, or CSV.
When BINARY is specified, CSV, FIXED, and TEXT cannot be specified through option or copy_option.
If the GUC parameter copy_special_character_version is set to 'no_error', invalid characters will not be checked during the import and will be displayed as garbled characters in query results. The database server code must be the same as the client code. Exercise caution when enabling this parameter.
In binary mode, copy_special_character_version is set to 'no_error', and it takes effect only for columns of the TEXT, CHAR, VARCHAR, NVARCHAR2, and CLOB types.
Exporting data to STDOUT or importing from STDIN is not supported.

The following table lists the types of data that cannot be imported or exported in binary mode.

smgr	aclitem	gtsvector	any	trigger
language_handler	internal	opaque	anyelement	anynesttable
anyindexbytable	anynonarray	anyenum	fdw_handler	anyrange
hll_hashval	hash16	hash32	anyset	-

The following table lists some options that do not support binary import and export.

DELIMITER	NULL	EOL	CSV	FIXED
SMALLDATETIME_FORMAT	DATE_FORMAT	TIME_FORMAT	TIMESTAMP_FORMAT	DATEA_FORMAT

CSV
Enables the CSV mode. When CSV is specified, BINARY, FIXED, and TEXT cannot be specified through option or copy_option.
FIXED
Fixes column length. When the column length is fixed, DELIMITER, NULL, and CSV cannot be specified. When FIXED is specified, BINARY, CSV, and TEXT cannot be specified by option or copy_option. Newline characters in data columns cannot be correctly processed when the column length is fixed. FORMATTER must be specified.
The definition of fixed length is as follows:
- The column length of each record is the same.
- Spaces are used for column padding. Columns of the numeric type are left-aligned and columns of the string type are right-aligned.
- No delimiters are used between columns.
FORMATTER ( [ column_name ( offset, length ) ] [, ...] )
Defines the place of each column in the data file only in fixed length mode. Defines the place of each column in the data file in the column(offset,length) format.

Value range:
- The value of offset must be larger than 0. The unit is byte.
- The value of length must be larger than 0. The unit is byte.
The total length of all columns must be less than 1 GB.

All columns must not overlap with each other.

Replace columns that are not in the file with null.
QUOTE [AS] 'quote_character'
Specifies the quote character for a CSV file.

Value range:
- A single-byte character without '\0', '\r', '\n', or end-of-line characters. You are advised to set quote to an invisible character, such as 0x07, 0x08, or 0x1b.
- The value cannot contain delimiters (see [USING] DELIMITERS 'deli...) or be contained by delimiters.
- The value cannot contain NULL (see NULL [ AS ] 'null_string') or be included in NULL.
- When COMPATIBLE_ILLEGAL_CHARS is specified (see COMPATIBLE_ILLEGAL_CHAR...), NULL cannot be set to a space or a question mark (?). Otherwise, the delimiter may conflict with the characters after error tolerance conversion.
Default value: double quotation mark ("").
ESCAPE [AS] 'escape_character'
Specifies the escape character in CSV format.

Value range:
- A single-byte character without '\0'. You are advised to set ESCAPE to an invisible character, such as 0x07, 0x08, or 0x1b.
- The value cannot contain delimiters (see [USING] DELIMITERS 'deli...) or be contained by delimiters.
- The value cannot contain NULL (see NULL [ AS ] 'null_string') or be included in NULL.
- When COMPATIBLE_ILLEGAL_CHARS is specified (see COMPATIBLE_ILLEGAL_CHAR...), NULL cannot be set to a space or a question mark (?). Otherwise, the delimiter may conflict with the characters after error tolerance conversion.
- If the quote character is set (see QUOTE [AS] 'quote_chara...) but ESCAPE is not set, ESCAPE uses the same character as QUOTE.
Default value: double quotation marks ("").
EOL 'newline_character'
Specifies the end-of-line character (newline character) style of the imported or exported data file.

Value range: Common end-of-line characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special end-of-line characters include $ and #.
- Value range: multi-character end-of-line characters within 10 bytes.
- The end-of-line character cannot be contained in the delimiter.
- The end-of-line character cannot be contained in a NULL string.
- In non-CSV mode, the end-of-line character cannot contain the following characters: .abcdefghijklmnopqrstuvwxyz0123456789.
- An end-of-line character cannot contain '\0'.
The EOL parameter supports only the TEXT format for data import and export and does not support the CSV or FIXED format for data import. For forward compatibility, the EOL parameter can be set to 0x0A or 0x0D0A for data export in the CSV or FIXED format.
ENCODING 'encoding_name'
Specifies the name of a file encoding format.

Value range: a valid encoding format. Common encoding formats are utf8, gb18030, and gbk.

Default value: encoding format of the current client
IGNORE_EXTRA_DATA
Specifies that when the number of data source files exceeds the number of foreign table columns, excess columns at the end of the row are ignored. This parameter is used only during data import.
If this parameter is not used and the number of columns in the data source file is greater than that defined in the foreign table, the following error information is displayed:
```
extra data after last expected column
```
COMPATIBLE_ILLEGAL_CHARS
Specifies whether to tolerate invalid characters during data import and export.

Value range: true/on or false/off.
- true or on: No error message is reported and data import is not interrupted when there are invalid characters. Invalid characters are converted into valid ones, and then imported to the database. Invalid characters are converted into valid ones, and then exported from the database. No error is reported and data export is not interrupted.
- false or off: An error occurs when there are invalid characters, and the import stops. An error is reported and data export is interrupted when there are invalid characters and characters whose encodings do not exist.
Default value: false or off
The rules for converting invalid characters are as follows:
1. '\0' is converted to a space. When support_zero_char is enabled, data is imported in the original format.
2. Other invalid characters are converted to the characters specified by the GUC parameter convert_illegal_char_mode. The default value is '?'.
3. When compatible_illegal_chars is set to true or on, after invalid characters such as NULL, DELIMITER, QUOTE, and ESCAPE are converted to spaces or question marks, an error message stating "illegal chars conversion may confuse COPY escape 0x20" will be displayed to remind you of possible parameter confusion caused by the conversion.
4. If compatible_illegal_chars is set to true or on, data in binary format cannot be imported. The error message "cannot specify bulkload compatibility options in BINARY mode" is displayed.
5. If the GUC parameter copy_special_character_version is set to 'no_error', invalid characters will not be checked during the import and will be displayed as garbled characters in query results. Exercise caution when enabling this parameter.
6. If copy_special_character_version is set to 'no_error' and compatible_illegal_chars is set to true, the former has a higher priority. That is, invalid characters will not be checked during the import.
7. If copy_special_character_version is set to 'no_error' or compatible_illegal_chars is set to true, a record is written to the database run logs each time a row of data containing invalid characters is identified. The log writing behavior occupies a small amount of disk bandwidth. To eliminate the impact, set the GUC parameter enable_log_copy_illegal_chars to off, that is, do not write records to the database run logs. You are advised not to modify this configuration unless otherwise required.
The rules for exporting invalid characters are as follows:

The export is classified into the scenario where transcoding is required (the database server code is different from the client code) and the scenario where transcoding is not required (the database server code is the same as the client code). The export performance varies according to the scenario. You can run show server_encoding; to query the database server code and show client_encoding; to query the database client code.
- In scenarios where transcoding is required:
1. '\0' is converted to a space. When support_zero_char is enabled, data is exported in the original format.
2. Other invalid characters are converted to the characters specified by the GUC parameter convert_illegal_char_mode. The default value is '?'.
3. When compatible_illegal_chars is set to true or on, after invalid characters such as NULL, DELIMITER, QUOTE, and ESCAPE are exported to spaces or question marks, an error message stating "illegal chars conversion may confuse COPY escape 0x20" will be displayed to remind you of possible parameter confusion caused by the export.
4. If compatible_illegal_chars is set to true or on, data in binary format cannot be exported.
5. If compatible_illegal_chars is set to true, a record is written to the database run logs each time a row of data containing invalid characters is identified. The log writing behavior occupies a small amount of disk bandwidth. To eliminate the impact, set the GUC parameter enable_log_copy_illegal_chars to off, that is, do not write records to the database run logs. You are advised not to modify this configuration unless otherwise required.
- In scenarios where transcoding is not required:
1. Data is exported as it is in the database, even if the data contains invalid characters and compatible_illegal_chars is set to true.
2. If compatible_illegal_chars is set to true or on, data in binary format cannot be exported.
FILL_MISSING_FIELDS
Specifies how to handle the problem that a data row loses the last or multiple columns during data import. If one or multi is not specified or one is specified, null is used to supplement the last missing column (an error is reported if multiple columns are missing). If multi is specified, null is used to supplement the last missing columns.

Value range: true/on or false/off.

Default value: false or off
DATE_FORMAT 'date_format_string'
Specifies the DATE format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.

Value range: a valid DATE value. For details, see Date and Time Processing Functions and Operators.

You can use the TIMESTAMP_FORMAT parameter to set the DATE format to TIMESTAMP for data import. For details, see TIMESTAMP_FORMAT below.
TIME_FORMAT 'time_format_string'
Specifies the TIME format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.

Value range: a valid TIME value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
TIMESTAMP_FORMAT 'timestamp_format_string'
Specifies the TIMESTAMP format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.

Value range: a valid TIMESTAMP value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
DATEA_FORMAT 'datea_format_string'
Specifies the DATEA format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.

Value range: a valid DATEA value. For details, see Date and Time Processing Functions and Operators.
SMALLDATETIME_FORMAT 'smalldatetime_format_string'
Specifies the SMALLDATETIME format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.

Value range: a valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.
TABLE_COMPRESS_CLAUSE( ROW STORE COMPRESS ADVANCED [ MEDIUM | HIGH ] ROW [ON (EXPR)])
Compresses the imported data. EXPR is an expression. Data that meets the expression is compressed, and data that does not meet the expression is not compressed. The parameter is valid only for data import using COPY FROM.
- This applies only to Astore tables.
- TOAST data cannot be compressed.
- This feature is not supported during the upgrade observation period.
- When this feature is used, the new data page is used for storing the imported data.
- When this feature is used together with the error tolerance mechanism (LOG ERRORS or LOG ERRORS DATA), compression may have no benefits if error data frequently occurs.
- Compression is not supported when a single row is imported. For example, a partitioned table is imported and the function of non-rollback upon constraint violation is enabled, or the table contains triggers.
- This feature cannot be used together with logical decoding. If logical decoding is enabled, an error will occur during compression of data imported using COPY.
TRANSFORM ( { column_name [ data_type ] [ AS transform_expr ] } [, ...] )
Specifies the conversion expression of each column in the table. data_type specifies the data type of the column in the expression parameter. transform_expr is the target expression that returns the result value whose data type is the same as that of the target column in the table. For details about the expression, see Expressions.

COPY FROM does not support conversion expressions specified for distribution keys.
COPY_CUSTOM_ID 'custom_id_string'
During data import, enter a user-defined character string, which is used as the identifier of the COPY import.
- In non-separation-of-duties scenarios, the pg_catalog.gs_copy_summary table contains the custom_id column by default. If copy_custom_id is not used, NULL is inserted into the column.
- In separation-of-duties scenarios, if the schema with the same name as the current connection user contains the gs_copy_summary table with the custom_id column, the input string is recorded to the column, importing one row of data per COPY operation.
- In non-separation-of-duties scenarios, the pg_catalog.pgxc_copy_error_log table contains the custom_id column by default. If copy_custom_id is not used, NULL is inserted into the column.
- In separation-of-duties scenarios, if the schema with the same name as the current connection user contains the pgxc_copy_error_log table with the custom_id column, LOG ERRORS or LOG ERRORS DATA is used, and the error tolerance data volume is less than the value specified by reject limit, the character string input by this parameter is recorded to the custom_id column in the pgxc_copy_error_log table. One import may correspond to zero or multiple data records.
- This operation takes effect only for import. The error table and log table are not recorded during export.
- The length of a user-defined character string cannot exceed 64 bytes.
- If the error table and log table do not contain the custom_id column but the copy_custom_id parameter is set, an error message is displayed, prompting you to add a column to the table using a function or ALTER statement.
- The custom_id column in the error table and log table may not be unique. The column content can be duplicate.

option
Specifies all types of parameters of a compatible foreign table.
- FORMAT 'format_name'
  Specifies the format of the source data file in the foreign table.
  
  Value range: CSV, TEXT, FIXED, and BINARY
  - In CSV files, newline characters can be processed efficiently, but certain special characters may pose challenges. It is equivalent to the CSV parameter in copy_option.
  - In TEXT files, certain special characters can be processed efficiently, except newline characters.
  - In FIXED files, the column of each record is fixed to a certain length by padding the short ones with spaces and truncating the long ones. Newline characters in data columns cannot be correctly processed. It is equivalent to the FIXED parameter in copy_option.
  - In BINARY files, all data is stored/read as binary format rather than as text. It is faster than the text and CSV formats, but a binary-format file is less portable. It is equivalent to the BINARY parameter in copy_option.
  Default value: TEXT
- OIDS [ boolean ]
  Specifies whether to import and export OID hidden columns for tables (usually system catalogs) that contain OIDs.
  
  Value range: true/on or false/off.
  
  Default value: false
- DELIMITER 'delimiter_character'
  Specifies the column delimiter of data in a data file line. The value range is the same as that of [USING] DELIMITERS 'deli....
- NULL 'null_string'
  Specifies the string that represents a null value. The value is the same as that of NULL null_string.
- HEADER [ boolean ]
  Specifies whether the exported data file contains a header row and whether the first row in the data stream is the header row during import. A header row describes information about each column.
  
  Value range: true/on or false/off.
  - If the parameter is set to true or on, the first row of the data text is identified as the header row during import and will be ignored. The first row in the data file exported is the header row.
  - If the parameter is set to false or off, the first row of the data text is identified as data and imported. The exported data file does not contain a header row.
  Default value: false
  
  This parameter can be specified only in CSV or FIXED mode. The content in the header row of the FIXED file must meet the column length definition.
- FILEHEADER 'header_file_string'
  Specifies a file that defines the content in the header for exported data. The file contains data description of each column. It is equivalent to the FILEHEADER 'header_file... parameter in copy_option.
- FREEZE [ boolean ]
  Specifies a file that defines the content in the header for exported data. The file contains data description of each column.
  
  Value range: true/on or false/off. If the value is true or on, it is equivalent to the FREEZE parameter in copy_option.
  
  Default value: false
- NOESCAPING [ boolean ]
  Specifies, in the TEXT format, whether to escape the backslash (\) and its following characters.
  
  Value range: true/on or false/off. If the value is true or on, it is equivalent to the WITHOUT ESCAPING parameter in copy_option.
  
  Default value: false
- SKIP int_number
  Specifies that the first int_number rows of the data file are skipped during data import.
- USEEOF [ boolean ]
  The system does not report an error for "\." in the imported data.
  
  Value range: true/on or false/off. If the value is true or on, this parameter is equivalent to the USEEOF parameter.
  
  Default value: false
- QUOTE 'quote_character'
  Specifies a quote character for a CSV file. It is equivalent to the·QUOTE parameter in copy_option.
- ESCAPE 'escape_character'
  Specifies an escape character for a CSV file. It is equivalent to the ESCAPE parameter in copy_option.
- EOL 'newline_character'
  Specifies the newline character style of the imported or exported data file. The value is the same as that of EOL 'newline_character'.
- FORCE_QUOTE { ( column_name [, ...] ) | * }
  In CSV COPY TO mode, forces all non-NULL values to be surrounded by the QUOTE characters (see QUOTE [AS] 'quote_chara...) around each specified column.
  
  Value range: name of an existing column.
  - The asterisk (*) indicates all columns.
  - The NULL output is not enclosed in quotation marks. If the original data has the content as the NULL option, the original data is quoted.
- FORCE_NOT_NULL ( column_name [, ...] )
  In CSV COPY FROM mode, if the specified column is NULL during data writing, the content specified by NULL (see NULL [ AS ] 'null_string') is written instead. That is, this column does not allow NULL to be written directly.
  
  Value range: name of an existing column.
- ENCODING 'encoding_name'
  Specifies the encoding format of a data file. The default value is the current client-side encoding format.
- IGNORE_EXTRA_DATA [ boolean ]
  Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is used only during data import.
  
  Value range: true/on or false/off.
  - true/on: If the number of columns in a data source file is greater than that defined by the foreign table, the extra columns at the end of a row are ignored.
  - false/off: If the number of columns in a data source file is greater than that defined by the foreign table, the following error message is reported:
```
extra data after last expected column
```
  Default value: false
  
  If a newline character at the end of a row is missing and the row and another row are integrated into one, data in another row is ignored after the parameter is set to true.
- COMPATIBLE_ILLEGAL_CHARS [ boolean ]
  Specifies whether to tolerate invalid characters during data import and export.
  
  Value range: true/on or false/off.
  - true or on: No error message is reported and data import is not interrupted when there are invalid characters. Invalid characters are converted into valid ones, and then imported to the database. Invalid characters are converted into valid ones, and then exported from the database. No error is reported and data export is not interrupted.
  - false or off: An error occurs when there are invalid characters, and the import stops. An error is reported and data export is interrupted when there are invalid characters and characters whose encodings do not exist.
  Default value: false or off
  The rules for converting invalid characters are as follows:
  1. '\0' is converted to a space. When support_zero_char is enabled, data is imported in the original format.
  2. Other invalid characters are converted to the characters specified by the GUC parameter convert_illegal_char_mode. The default value is '?'.
  3. When compatible_illegal_chars is set to true or on, after invalid characters such as NULL, DELIMITER, QUOTE, and ESCAPE are converted to spaces or question marks, an error message stating "illegal chars conversion may confuse COPY escape 0x20" will be displayed to remind you of possible parameter confusion caused by the conversion.
  4. If compatible_illegal_chars is set to true or on, data in binary format cannot be imported and an error message stating "cannot specify bulkload compatibility options in BINARY mode" will be displayed.
  5. If the GUC parameter copy_special_character_version is set to 'no_error', invalid characters will not be checked during the import and will be displayed as garbled characters in query results. Exercise caution when enabling this parameter.
  6. If copy_special_character_version is set to 'no_error' and compatible_illegal_chars is set to true, the former has a higher priority. That is, invalid characters will not be checked during the import.
  7. If copy_special_character_version is set to 'no_error' or compatible_illegal_chars is set to true, a record is written to the database run logs each time a row of data containing invalid characters is identified. The log writing behavior occupies a small amount of disk bandwidth. To eliminate the impact, set the GUC parameter enable_log_copy_illegal_chars to off, that is, do not write records to the database run logs. You are advised not to modify this configuration unless otherwise required.
  The rules for exporting invalid characters are as follows:
  
  The export is classified into the scenario where transcoding is required (the database server code is different from the client code) and the scenario where transcoding is not required (the database server code is the same as the client code). The export performance varies according to the scenario. You can run show server_encoding; to query the database server code and show client_encoding; to query the database client code.
  - In scenarios where transcoding is required:
  1. '\0' is converted to a space. When support_zero_char is enabled, data is exported in the original format.
  2. Other invalid characters are converted to the characters specified by the GUC parameter convert_illegal_char_mode. The default value is '?'.
  3. When compatible_illegal_chars is set to true or on, after invalid characters such as NULL, DELIMITER, QUOTE, and ESCAPE are exported to spaces or question marks, an error message stating "illegal chars conversion may confuse COPY escape 0x20" will be displayed to remind you of possible parameter confusion caused by the export.
  4. If compatible_illegal_chars is set to true or on, data in binary format cannot be exported.
  5. If compatible_illegal_chars is set to true, a record is written to the database run logs each time a row of data containing invalid characters is identified. The log writing behavior occupies a small amount of disk bandwidth. To eliminate the impact, set the GUC parameter enable_log_copy_illegal_chars to off, that is, do not write records to the database run logs. You are advised not to modify this configuration unless otherwise required.
  - In scenarios where transcoding is not required:
  1. Data is exported as it is in the database, even if the data contains invalid characters and compatible_illegal_chars is set to true.
  2. If compatible_illegal_chars is set to true or on, data in binary format cannot be exported.
- FILL_MISSING_FIELDS
  Specifies how to handle the problem that the last column of a row in a source data file is lost during data import.
  
  Value range: true/on or false/off.
  
  Default value: false or off
- COPY_CUSTOM_ID 'custom_id_string'
  During data import, enter a user-defined character string, which is used as the identifier of the COPY import. It is equivalent to the COPY_CUSTOM_ID 'custom_... parameter in copy_option.
- DATE_FORMAT 'date_format_string'
  Specifies the DATE format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.
  
  Value range: a valid DATE value. For details, see Date and Time Processing Functions and Operators.
  
  You can use the TIMESTAMP_FORMAT parameter to set the DATE format to TIMESTAMP for data import. For details, see TIMESTAMP_FORMAT below.
- TIME_FORMAT 'time_format_string'
  Specifies the TIME format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.
  
  Value range: a valid TIME value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
- TIMESTAMP_FORMAT 'timestamp_format_string'
  Specifies the TIMESTAMP format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.
  
  Value range: a valid TIMESTAMP value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
- SMALLDATETIME_FORMAT 'smalldatetime_format_string'
  Specifies the SMALLDATETIME format for data import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. The parameter is valid only for data import using COPY FROM.
  
  Value range: a valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.
- DATEA_FORMAT 'datea_format_string'
  Specifies the DATEA format for data import (supported only in A-compatible mode). This parameter does not support the BINARY format. When data of such format is imported, the error message "cannot specify bulkload compatibility options in BINARY mode" will be displayed. The parameter is valid only for data import using COPY FROM.
  
  Value range: a valid DATEAFORMAT value. For details, see Date and Time Processing Functions and Operators.

The following special backslash sequences are recognized by COPY FROM:

\b: Backslash (ASCII 8)
\f: Form feed (ASCII 12)
\n: Newline character (ASCII 10)
\r: Carriage return character (ASCII 13)
\t: Tab (ASCII 9)
\v: Vertical tab (ASCII 11)
\digits: Backslash followed by one to three octal digits specifies that the ASCII value is the character with that numeric code.
\xdigits: Backslash followed by an x and one or two hex digits specifies the character with that numeric code.

Permission Control Examples

gaussdb=> copy t1 from '/home/xy/t1.csv';
ERROR:  COPY to or from a file is prohibited for security concerns
HINT:  Anyone can COPY to stdout or from stdin. gsql's \copy command also works for anyone.
gaussdb=> grant gs_role_copy_files to xxx;

This error occurs because a non-initial user does not have the COPY permission. To use the COPY function, an administrator need to enable the GUC parameter enable_copy_server_files, whereas a common user also needs to join the gs_role_copy_files group.

Examples

-- Create a schema.
gaussdb=#CREATE SCHEMA tpcds;

-- Create the tpcds.ship_mode table.
gaussdb=#CREATE TABLE tpcds.ship_mode
(
    SM_SHIP_MODE_SK           INTEGER               NOT NULL,
    SM_SHIP_MODE_ID           CHAR(16)              NOT NULL,
    SM_TYPE                   CHAR(30)                      ,
    SM_CODE                   CHAR(10)                      ,
    SM_CARRIER                CHAR(20)                      ,
    SM_CONTRACT               CHAR(20)
)
;

-- Insert a single data record into the tpcds.ship_mode table.
gaussdb=#INSERT INTO tpcds.ship_mode VALUES (1,'a','b','c','d','e');

-- Copy data from the tpcds.ship_mode file to the /home/omm/ds_ship_mode.dat file.
gaussdb=#COPY tpcds.ship_mode TO '/home/omm/ds_ship_mode.dat';

-- Output tpcds.ship_mode to STDOUT.
gaussdb=#COPY tpcds.ship_mode TO STDOUT;

-- Output the data of tpcds.ship_mode to STDOUT. The parameters are as follows: The delimiter is ',' (delimiter ',') and the encoding format is UTF8 (encoding 'utf8').
gaussdb=#COPY tpcds.ship_mode TO STDOUT WITH (delimiter ',', encoding 'utf8');

-- Output the data of tpcds.ship_mode to STDOUT. The parameters are as follows: The import format is CSV (format'CSV'), and the exported content of the SM_SHIP_MODE_SK column is enclosed in quotation marks (force_quote(SM_SHIP_MODE_SK)).
gaussdb=#COPY tpcds.ship_mode TO STDOUT WITH (format 'CSV', force_quote(SM_SHIP_MODE_SK));

-- Create the tpcds.ship_mode_t1 table.
gaussdb=#CREATE TABLE tpcds.ship_mode_t1
(
    SM_SHIP_MODE_SK           INTEGER               NOT NULL,
    SM_SHIP_MODE_ID           CHAR(16)              NOT NULL,
    SM_TYPE                   CHAR(30)                      ,
    SM_CODE                   CHAR(10)                      ,
    SM_CARRIER                CHAR(20)                      ,
    SM_CONTRACT               CHAR(20)
)
;

-- Copy data from STDIN to the tpcds.ship_mode_t1 table.
gaussdb=#COPY tpcds.ship_mode_t1 FROM STDIN;

-- Enter a row of data as an example.
--gaussdb=# COPY tpcds.ship_mode_t1 FROM STDIN;
--Enter data to be copied followed by a newline.
--End with a backslash and a period on a line by itself.
-->> 1	  a	a	a	a	a
-->> \.
-- Notes: Enter a tab between data.

-- Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table.
gaussdb=#COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat';

-- Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table, convert the data using the TRANSFORM expression, and insert the 10 characters on the left of the SM_TYPE column into the table.
gaussdb=#COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat' TRANSFORM (SM_TYPE AS LEFT(SM_TYPE, 10));

-- Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table, with the import format set to TEXT (format 'text'), the delimiter set to '\t' (delimiter E'\t'), excessive columns ignored (ignore_extra_data 'true'), and characters not escaped (noescaping 'true').
gaussdb=#COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat' WITH(format 'text', delimiter E'\t', ignore_extra_data 'true', noescaping 'true');

-- Copy data from the /home/omm/ds_ship_mode_fixed.dat file to the tpcds.ship_mode_t1 table, with the import format set to FIXED, fixed-length format specified (FORMATTER(SM_SHIP_MODE_SK(0, 2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20))), excessive columns ignored (ignore_extra_data), and headers included (header).
gaussdb=# COPY tpcds.ship_mode TO '/home/omm/ds_ship_mode_fixed.dat' FIXED FORMATTER(SM_SHIP_MODE_SK(0, 2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20)) header;
gaussdb=#COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode_fixed.dat' FIXED FORMATTER(SM_SHIP_MODE_SK(0, 2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20)) header ignore_extra_data;

-- Drop tables and the schema.
gaussdb=#DROP TABLE tpcds.ship_mode;
gaussdb=#DROP TABLE tpcds.ship_mode_t1;
gaussdb=#DROP SCHEMA tpcds;

Parent topic: C

Previous topic: COMMIT PREPARED

Next topic: CREATE AGGREGATE

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot