Updated on 2025-05-29 GMT+08:00

Guide to Exporting Erroneous Data

Data errors during export typically occur when character strings or binary data that does not match the server-side encoding is inserted into the database. To address this, you are advised to keep the client-side encoding consistent with the server-side encoding, eliminating the need for validity checks against the server-side encoding and data transcoding.

Encoding Consistency Principle During Export

  1. When the client-side encoding is consistent with the server-side encoding:
    • Native data is exported.
    • Data integrity and originality are guaranteed.
    • Character set conversion is not required.
  2. When the client-side encoding is inconsistent with the server-side encoding:
    • The client-side encoding is employed as the target encoding standard for the exported files.
    • The kernel first checks existing data against the server-side encoding. Upon detecting any data encoded in an illegal format, it will report an error.
    • The kernel then proceeds to transcode the data. If it encounters any characters that cannot be transcoded (due to code bits present in the source character set but not in the target character set), it will report an error.

Solutions to Illegal Encoding

If your database contains any data encoded in an illegal format and you wish to export the data without triggering an error, consider the following methods:

Preferred method: Keep the client-side encoding consistent with the server-side encoding. Then export data using the server-side encoding without performing any transcoding.

  1. Query the database server-side encoding.

    gaussdb=# SHOW server_encoding;

  2. Query the database client-side encoding.

    gaussdb=# SHOW client_encoding;

  3. Keep the client-side encoding consistent with the server-side encoding.

    gaussdb=# SET client_encoding = '{server_encoding}';

  4. Execute COPY to export data to a file in standard CSV format.

    gaussdb=# COPY test_copy TO '/data/test_copy.csv' CSV;

Alternative solution: Use placeholders ('?') to replace any bytes encoded in an illegal format. This solution depends on the transcoding capability of the database kernel and will alter the content of the exported data.

  1. Query the database server-side encoding.

    gaussdb=# SHOW server_encoding;

  2. Set the database client-side encoding as the target encoding.

    gaussdb=# SET client_encoding = {target_encoding};

  3. Leverage the kernel's transcoding capability to replace any bytes encoded in an illegal format while exporting data.

    gaussdb=# COPY test_copy TO '/data/test_copy.csv' CSV COMPATIBLE_ILLEGAL_CHARS;

  • Consider enabling the COMPATIBLE_ILLEGAL_CHARS parameter to correct any exported data encoded in an illegal format during export, while keeping the data in the database unchanged. Kindly use this parameter as necessary.
  • When enabled, the COMPATIBLE_ILLEGAL_CHARS parameter:
    • Replaces illegal characters with the ones specified in the convert_illegal_char_mode parameter. The default replacement character is '?' (U+003F).
    • Replaces zero characters (U+0000) with spaces (U+0020). If you do not need this replacement, configure the zero-character function in different compatible modes.
  • For detailed constraints on the COMPATIBLE_ILLEGAL_CHAR.... parameter, refer to the COPY syntax section.