Help Center> Data Warehouse Service> Troubleshooting> Data Import and Export> "ERROR: invalid byte sequence for encoding 'UTF8': 0x00" Is Reported When Data Is Imported to GaussDB(DWS) Using COPY FROM

"ERROR: invalid byte sequence for encoding 'UTF8': 0x00" Is Reported When Data Is Imported to GaussDB(DWS) Using COPY FROM

Symptom

"ERROR: invalid byte sequence for encoding 'UTF8': 0x00" is reported when data is imported to GaussDB(DWS) using COPY FROM.

Possible Causes

The data file is imported from an Oracle database, and the file is UTF-8 encoded. The error message also contains the number of lines. Because the file is too large to be opened by running the vim command, run the sed command to extract the lines, and then run the vim command to open the file. No exception is found. Part of the file can be imported after running the split command to split the file by the number of lines.

According to the GaussDB(DWS) document, the direct cause of this error is that the fields or variables in VARCHAR type do not support character strings containing '\0' (that is, the value 0x00 and the UTF code '\u0000'). The solution is to delete '\0' from the character string in advance.

Handling Procedure

Run the sed command to replace 0x00.

1
sed -i 's/\x00//g;' file

Parameter description: -i indicates direct replacement in the original file. s/ indicates replacement. /g indicates global replacement.