Guide to Importing Erroneous Data
The error tolerance mechanism during data import provides two modes.
Intelligent Correction Mode (Adaptive Import)
- Principle: Prioritize data integrity and leverage intelligent correction to ensure that the highest possible volume of accurate data is imported.
- Scenario: Apply this mode when the imported data contains column count exceptions, including extra, missing, or deprecated columns, as well as abnormal characters.
- Procedure:
- Rectify the column count exceptions by truncating extra columns, padding missing columns, and discarding deprecated columns.
- Transcode the character sets and clean illegal characters.
- Write all data into the target database.
- Output: corrected data set, along with logs recorded by GaussDB for character exceptions
Strict Verification Mode (Precise Import)
- Principle: Prioritize data accuracy and enforce strict compliance when importing data into the database.
- Scenario: Apply this mode in fields that demand strict data accuracy, particularly for sensitive data like medical records and financial transactions. This mode helps alleviate concerns about intelligent correction potentially compromising data accuracy.
- Procedure:
- Conduct a series of verifications at multiple levels (column count exceptions, character exceptions, data type conversion exceptions, and constraint conflicts).
- Generate an error diagnosis report (which includes row numbers, error types, and error data).
- Create an erroneous data isolation area.
- Only import the original data that successfully passes these verifications into the database.
- Output: pure data set and error details report (further details available in gs_copy_error_log and gs_copy_summary)
- Error tolerance levels:
- Level 1 error tolerance: applies to all the exceptions processed by the intelligent correction mode, including extra data source columns, data type conversion errors, overlong columns, and transcoding exceptions. The process is as follows:
1 2 3 4
-- When the number of data type errors during data import does not exceed 100, no error will be reported, and GaussDB will proceed to the next row. However, if the number exceeds 100, an error will be reported. The details and row number of the erroneous data will be recorded in the gs_copy_error_log table. gaussdb=# copy test_copy from '/home/omm/temp/test.csv' log errors reject limit '100' csv; -- In comparison to the previous statement, the next one includes an additional action: recording all data from the erroneous row in gs_copy_error_log. This action is recommended when there is no risk to data security. Note that SYSADMIN permissions are required. gaussdb=# copy test_copy from '/home/omm/temp/test.csv' log errors data reject limit '100' csv;
- Level 2 error tolerance: Building on Level 1, Level 2 extends support to address constraint conflicts in data, including NOT NULL constraints, conditional constraints, PRIMARY KEY constraints, UNIQUE constraints, and unique index constraints. The process is as follows:
1 2 3
-- To support a new error type, constraint conflict, while keeping the same COPY statement and error tolerance logic as Level 1, set the GUC parameter as follows: gaussdb=# SET a_format_load_with_constraints_violation = 's2'; gaussdb=# copy test_copy from '/home/omm/temp/test.csv' log errors data reject limit '100' csv;
- Level 1 error tolerance: applies to all the exceptions processed by the intelligent correction mode, including extra data source columns, data type conversion errors, overlong columns, and transcoding exceptions. The process is as follows:
Conclusion
To optimize data import processes, you can combine the intelligent correction mode with the strict verification mode, with the intelligent correction mode taking priority. When intelligent correction is enabled during a COPY import and it corrects a data item, strict verification will not be triggered for that specific row. Consequently, the error table will not record relevant data, nor will it update reject limit. Before importing data, carefully evaluate whether to automatically correct column and character exceptions or discard them, depending on specific requirements.
For strict verification mode, Level 1 is recommended by default, as it effectively identifies the most common errors without compromising import performance. However, if you are operating in centralized A-compatible mode, you can consider Level 2. Be mindful that Level 2 can degrade import performance and consume additional memory resources. Therefore, you are advised not to use Level 2 by default. Enable Level 2 only when constraint type conflicts occur.

- This default option is that error tolerance does not support constraint conflicts. To make constraint conflicts tolerated, set the session-level GUC parameter a_format_load_with_constraints_violation to "s2" and import the file again.
- The conflicts of the NOT NULL constraints, conditional constraints, PRIMARY KEY constraints, UNIQUE constraints, and unique index constraints can be tolerated.
- This function is valid only in centralized A-compatible mode.
- A statement-level trigger cannot handle any constraint conflict above, so the attempt to import data into a table with such trigger will fail with an error reported.
- Under this feature, data will be inserted row by row instead of in batches, which deteriorates the import performance.
- Under this feature, the UB-tree indexes will be built row by row instead of in batches, degrading the index building performance.
- This feature is still valid even if a constraint conflict is triggered by an operation on a table with a row-level trigger. Constraint conflicts of row-level triggers are implemented in sub-transactions, which use more memory resources and increase execution time. Therefore, you are advised to use this feature when constraint conflicts are very likely to occur. In this scenario, the amount of data to be imported at a time by using COPY should be less than or equal to 1 GB.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot