Help Center> Data Warehouse Service (DWS)> FAQs> Data Import and Export> How Do I Implement Fault Tolerance Import Between Different Encoding Libraries
Updated on 2024-04-26 GMT+08:00

How Do I Implement Fault Tolerance Import Between Different Encoding Libraries

To import data from database A (UTF8) to database B (GBK), there may be a character set mismatch error which causes the data import to fail.

To import a small amount of data, run the \COPY command. The procedure is as follows:

  1. Create databases A and B. The encoding format of database A is UTF8, and that of database B is GBK.

    1
    2
    postgres=> CREATE DATABASE A ENCODING 'UTF8' template = template0;
    postgres=> CREATE DATABASE B ENCODING 'GBK' template = template0;
    

  2. View the database list. You can view the created databases A and B.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    postgres=> \l
                               List of databases
       Name    |  Owner  | Encoding  | Collate | Ctype | Access privileges
    -----------+---------+-----------+---------+-------+-------------------
     a         | dbadmin | UTF8      | C       | C     |
     b         | dbadmin | GBK       | C       | C     |
     gaussdb   | Ruby    | SQL_ASCII | C       | C     |
     postgres  | Ruby    | SQL_ASCII | C       | C     |
     template0 | Ruby    | SQL_ASCII | C       | C     | =c/Ruby          +
               |         |           |         |       | Ruby=CTc/Ruby
     template1 | Ruby    | SQL_ASCII | C       | C     | =c/Ruby          +
               |         |           |         |       | Ruby=CTc/Ruby
     xiaodi    | dbadmin | UTF8      | C       | C     |
    (7 rows)
    

  3. Switch to database A and enter the user password. Create a table named test01 and insert data into the table.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    postgres=> \c a
    Password for user dbadmin:
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_128_GCM_SHA256, bits: 128)
    You are now connected to database "a" as user "dbadmin".
    
    a=> CREATE TABLE test01
     (
         c_customer_sk             integer,
         c_customer_id             char(5),
         c_first_name              char(6),
         c_last_name               char(8)
     )
     with (orientation = column,compression=middle)
     distribute by hash (c_last_name);
    CREATE TABLE
    a=> INSERT INTO test01(c_customer_sk, c_customer_id, c_first_name) VALUES (3769, 'hello', 'Grace');
    INSERT 0 1
    a=> INSERT INTO test01 VALUES (456, 'good');
    INSERT 0 1
    

  4. Run the \COPY command to export data from the UTF8 library in Unicode format to the test01.dat file.

    1
    \copy test01 to '/opt/test01.dat' with (ENCODING 'Unicode');
    

  5. Switch to database B and create a table with the same name test01.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    a=> \c b
    Password for user dbadmin:
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_128_GCM_SHA256, bits: 128)
    You are now connected to database "b" as user "dbadmin".
    
    b=> CREATE TABLE test01
     (
         c_customer_sk             integer,
         c_customer_id             char(5),
         c_first_name              char(6),
         c_last_name               char(8)
     )
     with (orientation = column,compression=middle)
     distribute by hash (c_last_name);
    

  6. Run the \COPY command to import the test01.dat file to database B.

    1
    \copy test01 from '/opt/test01.dat' with (ENCODING 'Unicode' ,COMPATIBLE_ILLEGAL_CHARS 'true');
    
    • The error tolerance parameter COMPATIBLE_ILLEGAL_CHARS specifies that invalid characters are tolerated during data import. Invalid characters are converted and then imported to the database. No error message is displayed. The import is not interrupted.
    • The BINARY format is not supported. When data of such format is imported, error "cannot specify bulkload compatibility options in BINARY mode" will occur.
    • The parameter is valid only for data importing using the COPY FROM option.

  7. View data in the test01 table in database B.

    1
    2
    3
    4
    5
    6
    b=> select * from test01;
     c_customer_sk | c_customer_id | c_first_name | c_last_name
    ---------------+---------------+--------------+-------------
              3769 | hello         | Grace        |
               456 | good          |              |
    (2 rows)
    

  8. After the preceding operations are performed, data is imported from database A (UTF8) to database B (GBK).

Data Import and Export FAQs

more