COPY
Function
COPY copies data between tables and files. COPY FROM copies data from a file to a table. COPY TO copies data from a table to a file.
Use COPY FROM and COPY TO for limited local data transfers with low concurrency.

- Since GaussDB(DWS) is a fully managed cloud service, direct access to backend nodes is not available, making the COPY syntax unusable in this environment. Instead, employ the \COPY command via the gsql client; note that other tools like SQL editors do not support it.
- For better results, save your data files on OBS and use OBS foreign tables for importing into the database.
Precautions
- COPY applies to only tables and does not apply to views.
- To insert data to a table, you must have the permission to insert data.
- If a list of columns is specified, COPY will only copy the data in the specified columns to or from the file. If there are any columns in the table that are not in the column list, COPY FROM will insert the default values for those columns.
- If a source data file is specified, the file must be accessible from the server. If STDIN is specified, data is transmitted between the client and server. Separate columns by pressing Tab. Enter \. in a new line to indicate the end of input.
- If the number of columns in a row of the data file is smaller or larger than the expected number, COPY FROM displays an error message.
- A backslash and a period (\.) indicate the end of data. The end identifier is not required for reading data from a file and is required for copying data between client applications.
- In COPY FROM, "\N" indicates an empty character string, and "\\N" indicates the actual data "\N".
- COPY FROM does not support pre-processing of data during data import, for example, expression calculation and default value filling. If you need to perform pre-processing during data import, you need to import the data to a temporary table, and then run SQL statements to insert data to the table using expression or function operations. However, this method may cause I/O expansion, deteriorating data import performance.
- Transactions will be rolled back when data format errors occur during COPY FROM execution. In this case, error information is insufficient so you cannot easily locate the incorrect data from a large amount of raw data.
- You can use COPY TO to export data in TEXT or CSV format to OBS, but cannot use COPY FROM to import data from OBS. For CSV files that are exported to OBS and contain utf8 BOM, you are advised to use OBS to import them. Otherwise, the BOM field may fail to be identified.
Syntax
- Copy the data from a file to a table:
1 2 3 4 5 6 7 8 9 10
COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | STDIN } [ [ USING ] DELIMITERS 'delimiters' ] [ WITHOUT ESCAPING ] [ LOG ERRORS ] [ LOG ERRORS data ] [ REJECT LIMIT 'limit' ] [ [ WITH ] ( option [, ...] ) ] | copy_option | FIXED FORMATTER ( { column_name( offset, length ) } [, ...] ) [ ( option [, ...] ) | copy_option [ ...] ] ];
In the SQL syntax, FIXED, FORMATTER ( { column_name( offset, length ) } [, ...] ), and [ ( option [, ...] ) | copy_option [ ...] ] can be in any sequence.
- Copy the data from a table to a file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
COPY table_name [ ( column_name [, ...] ) ] TO { 'filename' | STDOUT } [ [ USING ] DELIMITERS 'delimiters' ] [ WITHOUT ESCAPING ] [ [ WITH ] ( option [, ...] ) ] | copy_option | FIXED FORMATTER ( { column_name( offset, length ) } [, ...] ) [ ( option [, ...] ) | copy_option [ ...] ] ]; COPY query TO { 'filename' | STDOUT } [ WITHOUT ESCAPING ] [ [ WITH ] ( option [, ...] ) ] | copy_option | FIXED FORMATTER ( { column_name( offset, length ) } [, ...] ) [ ( option [, ...] ) | copy_option [ ...] ] ];
- The syntax constraints of COPY TO are as follows:
(query) is incompatible with [USING] DELIMITER. If the data of COPY TO comes from a query result, COPY TO cannot specify [USING] DELIMITERS.
- Use spaces to separate copy_option following FIXED FORMATTER.
- copy_option is the native parameter, while option is the parameter imported by a compatible foreign table.
- In the SQL syntax, FIXED, FORMATTER ( { column_name( offset, length ) } [, ...] ), and [ ( option [, ...] ) | copy_option [ ...] ] can be in any sequence.
- The syntax constraints of COPY TO are as follows:
Parameter Description
Parameter |
Description |
Value Range |
---|---|---|
query |
Copies the query result. |
A SELECT or VALUES command in parentheses. |
table_name |
Specifies the name (optionally schema-qualified) of an existing table. |
An existing table name. |
column_name |
Specifies the list of optional fields to be copied. |
If the column list is not specified, all columns in the table will be copied. |
STDIN |
Indicates that the input comes from the client application. |
- |
STDOUT |
Specifies that output goes to the client application. |
- |
FIXED |
Fixes column length.
|
The definition of fixed length:
|
[USING] DELIMITER 'delimiters' |
The string that separates columns within each row (line) of the file, and it cannot be larger than 10 bytes. |
The delimiter cannot include any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789 The default value is a tab character in text format and a comma in CSV format. |
WITHOUT ESCAPING |
Specifies that the backslash (\) and its following characters are not escaped in text format. |
Only the text format is supported. |
LOG ERRORS |
If this parameter is specified, the error tolerance mechanism for data type errors in the COPY FROM statement is enabled. Row errors are recorded in the public.pgxc_copy_error_log table in the database for future reference.
NOTICE:
The restrictions of this error tolerance parameter are as follows:
|
A value set while data is imported using COPY FROM. |
LOG ERRORS DATA |
The differences between LOG ERRORS DATA and LOG ERRORS are as follows:
|
- |
REJECT LIMIT 'limit' |
Used with the LOG ERROR parameter to set the upper limit of the tolerated errors in the COPY FROM statement. If the number of errors exceeds the limit, later errors will be reported based on the original mechanism.
NOTE:
Different from the GDS error tolerance mechanism, in the error tolerance mechanism described in the description of LOG ERRORS, the count of REJECT LIMIT is calculated based on the number of data parsing errors on the CN where the COPY FROM statement is run, not based on the number of errors on each DN. |
A positive integer (1 to INTMAX) or unlimited Default value:
|
FORMATTER |
Defines the position of each column in the data file in the column(offset,length) format when the fixed length mode is used (that is, the FIXED parameter is specified). |
|
OPTION { option_name ' value ' } |
Specifies all types of parameters of a compatible foreign table. |
For details, see Table 2. |
COPY_OPTION { option_name ' value ' } |
Specifies all types of native parameters of COPY. |
For details, see Table 3. |
Parameter |
Description |
Value Range |
||
---|---|---|---|---|
FORMAT |
Specifies the format of the source data file in the foreign table. |
CSV, text, fixed, and binary files are supported.
Default value: TEXT |
||
OIDS |
Copies the OID for each row.
NOTE:
An error is raised if OIDs are specified for a table that does not have OIDs, or in the case of copying a query. |
Default value: false |
||
DELIMITER |
Specifies the character that separates columns within each row (line) of the file.
NOTE:
|
The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes. Default value:
|
||
NULL |
Specifies how to represent a null value. |
Default value:
|
||
HEADER |
Specifies whether a file contains a header with the names of each column in the file. header is available only for CSV and FIXED files.
|
Default value: false |
||
QUOTE |
Specifies the quote character used when a data value is referenced in a CSV file. |
Default value: double quotation mark ("").
|
||
ESCAPE |
Specifies the escape character in CSV format. The escape character can only be a single-byte character. |
Single-byte character. Default value: double quotation mark ("") |
||
EOL 'newline_character' |
Specifies the newline character style of the imported or exported data file. |
Multi-character newline characters are supported, but the newline character cannot exceed 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
|
||
FORCE_QUOTE { ( column_name [, ...] ) | * } |
Forces quotation marks to be used for all non-null values in each specified column in CSV COPY TO mode. NULL values are not quoted. |
Existing fields. |
||
FORCE_NOT_NULL ( column_name [, ...] ) |
Assigns a value to a specified column in CSV COPY FROM mode. |
Existing fields. |
||
ENCODING |
Specifies the encoding format of a data file. |
The default value is the current database encoding format. Common encoding formats include UTF8, GBK, and GB18030. GB18030 has two versions: GB18030 and GB18030_2022. GB18030_2022 is the latest national standard in China prepared to support Chinese characters. |
||
IGNORE_EXTRA_DATA |
When the number of data source files exceeds the number of foreign table columns, whether ignoring excessive columns at the end of the row. This parameter is available only during data importing. |
Default value: false
NOTICE:
If the newline character at the end of the row is lost, setting the parameter to true will ignore data in the next row. |
||
COMPATIBLE_ILLEGAL_CHARS |
Specifies the error tolerance parameter for invalid characters. This parameter is valid only for COPY FROM import. This parameter does not support the binary format. The "cannot specify bulkload compatibility options in BINARY mode" error will be reported. The rule of error tolerance when you import invalid characters is as follows:
|
Default value: false or off |
||
FILL_MISSING_FIELDS |
Specifies whether to generate an error message when the last column in a row in the source file is lost during data loading. |
Default value: false |
||
DATE_FORMAT |
Imports data of the DATE type. It is valid only for COPY FROM import. This parameter does not support the binary format. The "cannot specify bulkload compatibility options in BINARY mode" error will be reported. |
The value must be in a valid DATE format. For details, see Date and Time Processing Functions and Operators.
NOTICE:
In databases configured for Oracle compatibility (with DBCOMPATIBILITY set to ORA during table creation), the DATE type functions as TIMESTAMP. For more information about the syntax compatibility differences, see Syntax Compatibility Differences Among Oracle, Teradata, and MySQL. For details, see timestamp_format below. |
||
TIME_FORMAT |
Imports data of the TIME type. This parameter is valid only for COPY FROM import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. |
Valid TIME formats. Time zones are not supported. For details, see Date and Time Processing Functions and Operators. |
||
TIMESTAMP_FORMAT |
Imports data of the TIMESTAMP type. This parameter is valid only for COPY FROM import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. |
Valid TIMESTAMP formats. Time zones are not supported. For details, see Date and Time Processing Functions and Operators. |
||
SMALLDATETIME_FORMAT |
Imports data of the SMALLDATETIME type. This parameter is valid only for COPY FROM import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. |
Valid SMALLDATETIME formats. For details, see Date and Time Processing Functions and Operators. |
||
SERVER |
Specifies the OBS server, which takes effect only when COPY TO is used. filename is a path on OBS, indicating that data is exported to OBS. |
Name of the created OBS server. |
||
BOM |
Specifies whether to add the UTF-8 BOM field to the exported CSV file. This parameter is valid only when COPY TO is executed and the SERVER parameter is set to a valid value. The exported file must be encoded in UTF-8 format. |
Default value: false |
||
MAXROW |
Specifies the maximum number of lines in an exported file. A new file is generated if the number of lines exceeds the maximum. This parameter is valid only when COPY TO is executed and the SERVER parameter is set to a valid value. When HEADER is set to true, MAXROW must be greater than 1. It must be specified together with FILEPREFIX. |
1–2147483647 |
||
FILEPREFIX |
Specifies the prefix of an export file name. This parameter is valid only when COPY TO is executed and the SERVER parameter is set to a valid value. It must be specified together with MAXROW. |
A valid string that does not start or end with a slash (/). |
||
PRESERVE_BLANKS |
Specifies whether to retain the blank characters (including spaces, \t, \v, and \f) at the end of each column during fixed-length import. This parameter is supported only by clusters of 8.2.0.100 or later. |
Default value: false/off. |
Parameter |
Description |
Value Range |
||
---|---|---|---|---|
OIDS |
Copies the OID for each row. An error is raised if OIDs are specified for a table that does not have OIDs, or in the case of copying a query. |
- |
||
NULL null_string |
Specifies the string that represents a null value.
NOTICE:
When using COPY FROM, any data item that matches this string will be stored as a NULL value, so you should make sure that you use the same string as you used with COPY TO. |
Default value:
|
||
HEADER |
Specifies whether a file contains a header with the names of each column in the file. header is available only for CSV and FIXED files. |
|
||
FILEHEADER |
Specifies a file that defines the content in the header for exported data. The file contains data description of each column.
NOTICE:
|
- |
||
FREEZE |
Sets the COPY loaded data row as frozen, like these data have executed VACUUM FREEZE. This is a performance option of initial data loading. The data will be frozen only when the following three requirements are met:
NOTICE:
When COPY is completed, all other sessions will immediately recognize the data. This violates the general principle of MVCC visibility and leads to potential risks. |
- |
||
FORCE NOT NULL column_name [, ...] |
Assigns a value to a specified column in CSV COPY FROM mode. |
Existing fields. |
||
FORCE QUOTE { column_name [, ...] | * } |
Forces quotation marks to be used for all non-null values in each specified column in CSV COPY TO mode. NULL values are not enclosed in quotation marks. |
Existing fields. |
||
BINARY |
The binary format option causes all data to be stored/read as binary format rather than as text. In binary mode, you cannot declare DELIMITER, NULL, or CSV. After the binary type is specified, you cannot use option or copy_option to specify the CSV, fixed, and text types. |
- |
||
CSV |
Enables the CSV mode. After the CSV type is specified, you cannot use option or copy_option to specify the binary, fixed, or text type. |
- |
||
QUOTE [AS] 'quote_character' |
Specifies the quote character for a CSV file.
|
Default value: double quotation mark ("") |
||
ESCAPE [AS] 'escape_character' |
Specifies an escape character for a CSV file. The value must be a single-byte character. |
The default value is a double quotation mark ("). If it is the same as the value of quote, it will be replaced with \0. |
||
EOL 'newline_character' |
Specifies the newline character style of the imported or exported data file. |
Multi-character newline characters are supported, but the newline character cannot exceed 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
|
||
ENCODING 'encoding_name' |
Specifies the name of a file encoding format. |
Valid encoding formats. Default value: current encoding format. |
||
IGNORE_EXTRA_DATA |
When the number of data source files exceeds the number of foreign table columns, excess columns at the end of the row are ignored. This parameter is available only during data importing.
If you do not use this parameter, and the number of data source files exceeds the number of foreign table columns, the following error information will be displayed:
|
- |
||
COMPATIBLE_ILLEGAL_CHARS |
Specifies error tolerance for invalid characters during importing. Invalid characters are converted before importing. No error is reported and the import is not interrupted. The binary format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. It is valid only for COPY FROM import. If this parameter is not used, an error is reported when an invalid character is encountered during the import, and the import is interrupted. The rule of error tolerance when you import invalid characters is as follows:
|
- |
||
FILL_MISSING_FIELDS |
Specifies whether to generate an error message when the last column in a row in the source file is lost during data loading.
NOTICE:
Do not specify this option. If this option is enabled, the parser will ignore the data problem during CN data parsing (that is, the fault tolerance range of the COPY error table) and report an error again on the DN. As a result, the COPY error table (with LOG ERRORS REJECT LIMIT enabled) cannot capture such data exceptions. |
Default value: false or off |
||
DATE_FORMAT 'date_format_string' |
Imports data of the DATE type. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. This parameter is valid only for COPY FROM import.
NOTE:
In databases configured for Oracle compatibility (with DBCOMPATIBILITY set to ORA during table creation), the DATE type functions as TIMESTAMP. For more information about the syntax compatibility differences, see Syntax Compatibility Differences Among Oracle, Teradata, and MySQL. For details, see timestamp_format below. |
Valid DATE formats. For details, see Date and Time Processing Functions and Operators. |
||
TIME_FORMAT 'time_format_string' |
Imports data of the TIME type. This parameter is valid only for COPY FROM import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. |
Valid TIME formats. Time zones are not supported. For details, see Date and Time Processing Functions and Operators. |
||
TIMESTAMP_FORMAT 'timestamp_format_string' |
Imports data of the TIMESTAMP type. This parameter is valid only for COPY FROM import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. |
Valid TIMESTAMP formats. Time zones are not supported. For details, see Date and Time Processing Functions and Operators. |
||
SMALLDATETIME_FORMAT 'smalldatetime_format_string' |
Imports data of the SMALLDATETIME type. This parameter is valid only for COPY FROM import. The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur. |
Valid SMALLDATETIME formats. For details, see Date and Time Processing Functions and Operators. |
||
PRESERVE_BLANKS |
Specifies whether to retain the blank characters (including spaces, \t, \v, and \f) at the end of each column during fixed-length import. This parameter is supported by clusters of version 8.2.0.100 or later. |
Default value: false/off. |
The following special backslash sequences are recognized by COPY FROM:
Backslash Sequence |
Description |
---|---|
\b |
Backslash (ASCII 8) |
\f |
Form feed (ASCII 12) |
\n |
Newline character (ASCII 10) |
\r |
Carriage return character (ASCII 13) |
\t |
Tab (ASCII 9) |
\v |
Vertical tab (ASCII 11) |
\digits |
Backslash followed by one to three octal digits specifies the ASCII value is the character with that numeric code. |
\xdigits |
Backslash followed by an x and one or two hex digits specifies the character with that numeric code. |
Example (supported only by the gsql client)
Copy data from the tpcds.ship_mode file to the /home/omm/ds_ship_mode.dat file:
1
|
\COPY tpcds.ship_mode TO '/home/omm/ds_ship_mode.dat'; |
Copy the query result from the tpcds.ship_mode file to the /home/omm/ds_ship_mode.dat file.
1
|
\COPY (SELECT * FROM tpcds.ship_mode WHERE a <> null) TO '/home/omm/ds_ship_mode.dat'; |
Write tpcds.ship_mode as output to stdout:
1
|
\COPY tpcds.ship_mode TO stdout; |
Create the tpcds.ship_mode_t1 table:
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE tpcds.ship_mode_t1 ( SM_SHIP_MODE_SK INTEGER NOT NULL, SM_SHIP_MODE_ID CHAR(16) NOT NULL, SM_TYPE CHAR(30) , SM_CODE CHAR(10) , SM_CARRIER CHAR(20) , SM_CONTRACT CHAR(20) ) WITH (ORIENTATION = COLUMN,COMPRESSION=MIDDLE) DISTRIBUTE BY HASH(SM_SHIP_MODE_SK ); |
Copy data from stdin to the tpcds.ship_mode_t1 table:
1
|
\COPY tpcds.ship_mode_t1 FROM stdin; |
Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table:
1
|
\COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat'; |
Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table, with the import format set to TEXT (format 'text'), the delimiter set to \t' (delimiter E'\t'), excessive columns ignored (ignore_extra_data 'true'), and characters not escaped (noescaping 'true').
1
|
\COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat' WITH(format 'text', delimiter E'\t', ignore_extra_data 'true', noescaping 'true'); |
Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table, with the import format set to FIXED, fixed-length format specified (FORMATTER(SM_SHIP_MODE_SK(0, 2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20))), excessive columns ignored (ignore_extra_data), and headers included (header).
1
|
\COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat' FIXED FORMATTER(SM_SHIP_MODE_SK(0, 2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20)) header ignore_extra_data; |
Copy data from the /home/omm/ds_ship_mode.dat file to the tpcds.ship_mode_t1 table, with the import format set to FIXED, fixed-length format specified (FORMATTER(SM_SHIP_MODE_SK(0, 2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20))), excessive columns ignored (ignore_extra_data), headers included (header), and trailing \t retained.
1
|
\COPY tpcds.ship_mode_t1 FROM '/home/omm/ds_ship_mode.dat' (FORMAT 'fixed', FORMATTER (SM_SHIP_MODE_SK(0,2), SM_SHIP_MODE_ID(2,16), SM_TYPE(18,30), SM_CODE(50,10), SM_CARRIER(61,20), SM_CONTRACT(82,20)), PRESERVE_BLANKS'true', HEADER 'true', IGNORE_EXTRA_DATA 'true'); |
Export tpcds.ship_mode_t1 as a text file ds_ship_mode.dat in the OBS directory /bucket/path/. You need to specify the server options that contain OBS access information:
1
|
\COPY tpcds.ship_mode_t1 TO '/bucket/path/ds_ship_mode.dat' WITH (format 'text', encoding 'utf8', server 'obs_server'); |
Export tpcds.ship_mode_t1 as a CSV file in the OBS directory /bucket/path/. You need to specify the server options that contain OBS access information. The file contains the title line and BOM header. A single file can contain a maximum of 1000 lines. If the number of lines exceeds 1000, a new file is generated. The user-defined file name prefix is justprefix:
1
|
\COPY (select * from tpcds.ship_mode_t1 where SM_SHIP_MODE_SK=1060) TO '/bucket/path/' WITH (format 'csv', header 'on', encoding 'utf8', server 'obs_server', bom 'on', maxrow '1000', fileprefix 'justprefix'); |
Delete the tpcds.ship_mode_t1 table:
1
|
DROP TABLE tpcds.ship_mode_t1; |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot