Help Center/ GaussDB(DWS)/ Troubleshooting/ Data Import and Export/ "ERROR: invalid byte sequence for encoding 'UTF8': 0x00" Is Reported When Data Is Imported to GaussDB(DWS) Using COPY FROM
Updated on 2024-03-08 GMT+08:00

"ERROR: invalid byte sequence for encoding 'UTF8': 0x00" Is Reported When Data Is Imported to GaussDB(DWS) Using COPY FROM

Symptom

"ERROR: invalid byte sequence for encoding 'UTF8': 0x00" is reported when data is imported to GaussDB(DWS) using COPY FROM.

Possible Causes

The data file is imported from an Oracle database, and the file is UTF-8 encoded. The error message also contains the number of lines. Because the file is too large to be opened by running the vim command, the sed command is used to extract the lines, and then the vim command is used to open the file. No exception is found. Part of the file can be imported after running the split command to split the file by the number of lines.

According to the analysis, fields or variables of the varchar type in GaussDB(DWS) cannot contain '\0' (that is, 0x00 and UTF encoding '\u0000'). Delete '\0' from the string before importing it.

Handling Procedure

Run the sed command to replace 0x00.

1
sed -i 's/\x00//g;' file

Parameter:

  • -i indicates replacement in the original file.
  • s/ indicates single replacement.
  • /g indicates global replacement.