Updated on 2023-10-23 GMT+08:00

Parallel Data Import

GaussDB provides a parallel data import function that enables a large amount of data to be imported in a fast and efficient manner. This section describes parameters for importing data in parallel.

raise_errors_if_no_files

Parameter description: Specifies whether to distinguish between the problems "the number of imported file records is empty" and "the imported file does not exist". If this parameter is set to on and the problem "the imported file does not exist" occurs, GaussDB will report the error message "file does not exist".

This parameter is a SUSET parameter. Set it based on instructions provided in Table 1.

Value range: Boolean

  • on indicates that the messages of "the number of imported file records is empty" and "the imported file does not exist" are distinguished when files are imported.
  • off indicates that the messages of "the number of imported file records is empty" and "the imported file does not exist" are the same when files are imported.

Default value: off

partition_mem_batch

Parameter description: In order to optimize the inserting of column-store partitioned tables in batches, the data is buffered during the inserting process and then written in the disk. You can specify the number of caches through partition_mem_batch. If the value is too large, much memory will be consumed. If it is too small, the performance of inserting column-store partitioned tables in batches will deteriorate.

This parameter is a USERSET parameter. Set it based on instructions provided in Table 1.

Value range: 1 to 65535

Default value: 256

partition_max_cache_size

Parameter description: In order to optimize the inserting of column-store partitioned tables in batches, the data is buffered during the inserting process and then written in the disk. You can specify the data buffer cache size through partition_max_cache_size. If the value is too large, much memory will be consumed. If it is too small, the performance of inserting column-store partitioned tables in batches will deteriorate.

This parameter is a USERSET parameter. Set it based on instructions provided in Table 1.

Value range:

4096 to 1073741823. The unit is KB.

Default value: 2GB

gds_debug_mod

Parameter description: Specifies whether to enable the debug function of Gauss Data Service (GDS). This parameter is used to better locate and analyze GDS faults. After the debug function is enabled, types of packets received or sent by GDS, peer end of GDS during command interaction, and other interaction information about GDS are written into the logs of corresponding nodes in the cluster. In this way, the state switching on the GaussDB state machine and the current state are recorded. If this function is enabled, additional log I/O resources will be consumed, affecting log performance and validity. You are advised to enable this function only when locating GDS faults.

This parameter is a USERSET parameter. Set it based on instructions provided in Table 1.

Value range:

  • on indicates that the GDS debug function is enabled.
  • off indicates that the GDS debug function is disabled.

Default value: off

enable_delta_store

Parameter description: Specifies whether to enable delta tables for column-store tables. Delta tables will improve the performance of importing a single piece of data to a column-store table and prevent table bloating. If this parameter is set to on, data to be imported to a column-store table will be stored in the delta table when the data volume is less than DELTAROW_THRESHOLD specified in table definition and otherwise will be stored in CUs of the main table. This parameter affects all operations involving data transfer of column-store tables, including INSERT, COPY, VACUUM, VACUUM FULL, VACUUM DELTAMERGE, and data redistribution.

This parameter is a POSTMASTER parameter. Set it based on instructions provided in Table 1.

Value range:

  • on indicates that delta tables are enabled.
  • off indicates that delta tables are disabled.

Default value: off

safe_data_path

Parameter description: Specifies the path prefix restriction except for the initial user. Currently, the restrictions are posed on COPY and advanced package paths. The path cannot end with a slash (/) or contain periods (..).

This parameter is a SIGHUP parameter. Set it based on instructions provided in Table 1.

Value range: a string of up to 4096 characters

Default value: NULL

enable_copy_server_files

Parameter description: Specifies whether to enable the permission to copy server files.

This parameter is a SIGHUP parameter. Set it based on instructions provided in Table 1.

Value range: Boolean

  • on indicates that the permission to copy server files is enabled.
  • off indicates that the permission to copy server files is disabled.

Default value: off

When the enable_copy_server_files parameter is disabled, only the initial user is allowed to run the COPY FROM FILENAME or COPY TO FILENAME statement. When the enable_copy_server_files parameter is enabled, users with the SYSADMIN permission or users who inherit the gs_role_copy_files permission of the built-in role are allowed to run the COPY FROM FILENAME or COPY TO FILENAME statement.