Updated on 2025-05-29 GMT+08:00

Parallel Data Import and Export

GaussDB provides a parallel data import and export function that enables a large amount of data to be imported and exported in a fast and efficient manner. This section describes parameters for importing and exporting data to GaussDB in parallel.

raise_errors_if_no_files

Parameter description: Specifies whether to distinguish between "the number of imported file records is empty" and "the imported file does not exist." If this parameter is enabled and the imported file does not exist, GaussDB displays an error message indicating that the file does not exist.

Parameter type: Boolean.

Unit: none

Value range:

  • on: The messages of "the number of imported file records is empty" and "the imported file does not exist" are distinguished when files are imported.
  • off: The messages of "the number of imported file records is empty" and "the imported file does not exist" are the same when files are imported.

Default value: off

Setting method: This is a SUSET parameter. Set it based on instructions provided in Table 1.

Setting suggestion: Retain the default value.

Risks and impacts of improper settings: Change the parameter value after fully understanding the parameter meaning and verifying it through testing.

The raise_errors_if_no_files parameter is valid only when the distributed GDS tool is used.

gds_debug_mod

Parameter description: Specifies whether to enable the debug function of Gauss Data Service (GDS). This parameter is used to better locate and analyze GDS faults. After the debug function is enabled, types of packets received or sent by GDS, peer end of GDS during command interaction, and other interaction information about GDS are written into the logs of corresponding nodes in the cluster. In this way, the state switching on the GaussDB state machine and the current state are recorded.

Parameter type: Boolean.

Unit: none

Value range:

  • on: The GDS debug function is enabled.
  • off: The GDS debug function is disabled.

Default value: off

Setting method: This is a USERSET parameter. Set it based on instructions provided in Table 1.

Setting suggestion: Enable this function only when locating GDS problems.

Risks and impacts of improper settings: If this parameter is enabled, extra logs will be generated, increasing the log I/O overhead and affecting the performance and log validity.

safe_data_path

Parameter description: Specifies the path prefix restriction except for the initial user. Currently, the path prefix restriction applies to the COPY operation and advanced packages.

Parameter type: string.

Unit: none

Value range: valid directory path containing a maximum of 4096 characters.

Default value: ""

Setting method: This is a SIGHUP parameter. Set it based on instructions provided in Table 1.

Setting suggestion: Ensure that the path safe_data_path covers a limited range. Do not set a large path range.

Risks and impacts of improper settings: If enable_copy_server_files is enabled and safe_data_path is set to a large value, malicious users may access sensitive files on the server.

  • If a soft link file exists in the safe_data_path directory, the system processes the file based on the actual file path to which the soft link points. If the actual path is not in the safe_data_path directory, an error is reported.
  • If a hard link file exists in the safe_data_path directory, it can be used properly. For security purposes, exercise caution when using hard link files. Do not create hard link files that point to other directories in the safe_data_path directory. Ensure that the permission on the safe_data_path directory is minimized.

enable_copy_server_files

Parameter description: Specifies whether to enable the permission to copy server files.

Parameter type: Boolean.

Unit: none

Value range:

  • on: The permission to copy server files is enabled. Users with the SYSADMIN permission or users inherited from the built-in role gs_role_copy_files can run the COPY FROM FILENAME or COPY TO FILENAME command.
  • off: The permission to copy server files is disabled. Only the initial user is allowed to run the COPY FROM FILENAME or COPY TO FILENAME command.

Default value: off

Setting method: This is a SIGHUP parameter. Set it based on instructions provided in Table 1.

Setting suggestions: Enable this function only when you need to copy server files. After this function is enabled, you are advised to set safe_data_path synchronously.

Risks and impacts of improper settings: After this parameter is enabled, malicious users may access sensitive files on the server if the value range of safe_data_path is too large.

support_binary_copy_version

Parameter description: Specifies whether the encoding information of the current database server is included when data is exported in BINARY mode using COPY TO.

Parameter type: string.

Unit: none

Value range: ''" and "header_encoding".

Table 1 Configuration items

Configuration Item

Behavior

header_encoding

When the BINARY mode of COPY TO is used to export data, the binary file header contains the encoding information of the current database server.

Empty string

Forward compatibility configuration is performed and data is exported in the original binary format.

Default value: "header_encoding"

Setting method: This is a USERSET parameter. Set it based on instructions provided in Table 1.

Setting suggestion: Retain the default value. If forward compatibility is required, leave this parameter empty.

Risks and impacts of improper settings: If this parameter is set to an empty string, the exported result does not contain the server-side encoding information. In scenarios where the encoding information is required, you need to query and record the information.

copy_special_character_version

Parameter description: Determines the processing of invalid characters during data import and export using COPY.

Parameter type: string.

Unit: none

Value range: ''", "no_error", and "per_byte".

Table 2 Configuration items

Configuration Item

Behavior

"no_error"

When COPY is used to import a data file with the same encoding as that on the server, fault tolerance is performed on the data that does not meet the encoding requirements in the data file. No error is reported and the data with the original codes is directly inserted into the table.

"per_byte"

Determines how to process files encoded in GBK or ZHS16GBK when COPY is used to export text files.

After the parameter is set to per_byte, one byte of data is exported at a time. Otherwise, two bytes of data are exported at a time. (One character occupies two bytes if data is encoded in GBK.)

""

The default value, which does not affect any function. Forward compatibility is supported. That is, an error is reported when invalid characters are found during COPY.

Default value: ""

Setting method: This is a USERSET parameter. Set it based on instructions provided in Table 1. Use gsql to connect to the database. If you use the set method, the value is case-insensitive. If you use gs_guc, the value can only be lowercase.

Setting suggestion: Retain the default value.

Risks and impacts of improper settings: Change the parameter value after fully understanding the parameter meaning and verifying it through testing.

  • To ensure that the data to be imported is valid, its encoding must be validated when it is being copied. If this parameter is enabled, verification against invalid encoding will be masked, which causes invalid characters in the field. Therefore, exercise caution before enabling this parameter.
  • Currently, data encoding verification is masked only when the server-side encoding is the same as the data encoding. That is, if copy_special_character_version is set to "no_error", the database server-side encoding must be the same as the data file encoding. Otherwise, an error is reported. If no data encoding is specified, the client-side encoding is used by default.
  • In binary mode, copy_special_character_version is set to "no_error", and it takes effect only for fields of the TEXT, CHAR, VARCHAR, NVARCHAR2, or CLOB type.
  • This parameter is valid only in the database with character sets encoded in UTF-8, GB18030, GB18030_2022, ZHS16GBK, or LATIN1.
  • When the encoding of both the client and server is GBK or ZHS16GBK and the database contains data encoded in an invalid format, if copy_special_character_version is not set to "per_byte", the exported data file may contain unexpected data.
  • The priority of setting copy_special_character_version to "no_error" is higher than that of COMPATIBLE_ILLEGAL_CHARS in COPY.

enable_log_copy_illegal_chars

Parameter description: Specifies whether to write records to the database run log if invalid characters are encountered when gs_loader is used to import data or COPY is used to import or export data.

Parameter type: Boolean

Unit: none

Value range: on and off

Table 3 Configuration items

Configuration Item

Behavior

on

Each time a line of data contains invalid encoding characters, a record is written to the database run log.

off

Records related to invalid encoding characters are not written into database run logs.

Default value: on

Setting method: This is a USERSET parameter. Set it based on instructions provided in Table 1.

Setting suggestion: Retain the default value unless otherwise specified. Set this parameter to off only when there are requirements on database performance and disk bandwidth.

Risks and impacts of improper settings: If this parameter is set to off, records related to invalid encoding characters will not be written into run logs, affecting the fault locating capability.

a_format_enable_copy_empty_lobs

Parameter description: Specifies whether null strings can be inserted into the BLOB and CLOB types through null string input when COPY FROM is executed in ORA-compatible mode.

Parameter type: Boolean

Unit: none

Value range: on and off

Table 4 Configuration items

Configuration Item

Behavior

on

In ORA-compatible mode, null strings can be inserted into CLOB and BLOB types using null string input.

For example, after 1,"",3 is inserted into the table, the result of the second column is empty. However, WHERE col2 IS null cannot be used for filtering, and the column length is 0.

off

In ORA-compatible mode, null strings can be inserted into CLOB and BLOB types using null string input. A null string is converted to NULL for storage.

For example, after 1,"",3 is inserted into the table, the result of the second column is NULL. In this case, WHERE col2 IS null can be used to filter the result.

Default value: on

Setting method: This is a USERSET parameter. Set it based on instructions provided in Table 1.

Setting suggestion: Retain the default value. If forward compatibility is required, set this parameter to off.

  • Null string input and null value input:

    In the CSV format of COPY, 1,,3 indicates a null value input, and 1,"",3 indicates a null string input. When the parameter is enabled, the second column of the former is inserted as NULL, and the second column of the latter is inserted as a null string. When the parameter is disabled, the second columns of the former and latter are inserted as NULL.

  • For a newly installed database, the default value is on. For an upgraded database, the default value is off to ensure forward compatibility.

Risks and impacts of improper settings: Change the parameter value after fully understanding the parameter meaning and verifying it through testing.