Help Center> GaussDB(DWS)> Best Practices> Import and Export> Tutorial: Importing Data from OBS to a Cluster
Updated on 2024-03-13 GMT+08:00

Tutorial: Importing Data from OBS to a Cluster

Overview

This practice demonstrates how to upload sample data to OBS and import OBS data to the target table on GaussDB(DWS), helping you quickly learn how to import data from OBS to a GaussDB(DWS) cluster.

You can import data in TXT, CSV, ORC, PARQUET, CARBONDATA, or JSON format from OBS to a GaussDB(DWS) cluster for query.

This tutorial uses the CSV format as an example to describe how to perform the following operations:

  • Generate data files in CSV format.
  • Create an OBS bucket in the same region as the GaussDB(DWS) cluster, and upload data files to the OBS bucket.
  • Create a foreign table to import data from the OBS bucket to GaussDB(DWS) clusters.
  • Start GaussDB(DWS), create a table, and import data from OBS to the table.
  • Analyze import errors based on the information in the error table and correct these errors.

Estimated time: 30 minutes

Preparing Source Data Files

  • Data file product_info0.csv
    1
    2
    3
    4
    5
    100,XHDK-A,2017-09-01,A,2017 Shirt Women,red,M,328,2017-09-04,715,good!
    205,KDKE-B,2017-09-01,A,2017 T-shirt Women,pink,L,584,2017-09-05,40,very good!
    300,JODL-X,2017-09-01,A,2017 T-shirt men,red,XL,15,2017-09-03,502,Bad.
    310,QQPX-R,2017-09-02,B,2017 jacket women,red,L,411,2017-09-05,436,It's nice.
    150,ABEF-C,2017-09-03,B,2017 Jeans Women,blue,M,123,2017-09-06,120,good.
    
  • Data file product_info1.csv
    1
    2
    3
    4
    5
    200,BCQP-E,2017-09-04,B,2017 casual pants men,black,L,997,2017-09-10,301,good quality.
    250,EABE-D,2017-09-10,A,2017 dress women,black,S,841,2017-09-15,299,This dress fits well.
    108,CDXK-F,2017-09-11,A,2017 dress women,red,M,85,2017-09-14,22,It's really amazing to buy.
    450,MMCE-H,2017-09-11,A,2017 jacket women,white,M,114,2017-09-14,22,very good.
    260,OCDA-G,2017-09-12,B,2017 woolen coat women,red,L,2004,2017-09-15,826,Very comfortable.
    
  • Data file product_info2.csv
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    980,"ZKDS-J",2017-09-13,"B","2017 Women's Cotton Clothing","red","M",112,,,
    98,"FKQB-I",2017-09-15,"B","2017 new shoes men","red","M",4345,2017-09-18,5473
    50,"DMQY-K",2017-09-21,"A","2017 pants men","red","37",28,2017-09-25,58,"good","good","good"
    80,"GKLW-l",2017-09-22,"A","2017 Jeans Men","red","39",58,2017-09-25,72,"Very comfortable."
    30,"HWEC-L",2017-09-23,"A","2017 shoes women","red","M",403,2017-09-26,607,"good!"
    40,"IQPD-M",2017-09-24,"B","2017 new pants Women","red","M",35,2017-09-27,52,"very good."
    50,"LPEC-N",2017-09-25,"B","2017 dress Women","red","M",29,2017-09-28,47,"not good at all."
    60,"NQAB-O",2017-09-26,"B","2017 jacket women","red","S",69,2017-09-29,70,"It's beautiful."
    70,"HWNB-P",2017-09-27,"B","2017 jacket women","red","L",30,2017-09-30,55,"I like it so much"
    80,"JKHU-Q",2017-09-29,"C","2017 T-shirt","red","M",90,2017-10-02,82,"very good."
    
  1. Create a text file, open it using a local editing tool (for example, Visual Studio Code), and copy the sample data to the text file.
  2. Choose Format > Encode in UTF-8 without BOM.
  3. Choose File > Save as.
  4. In the displayed dialog box, enter the file name, set the file name extension to .csv, and click Save.

Uploading Data to OBS

  1. Store the three CSV source data files in the OBS bucket.

    1. Log in to the OBS management console.

      Click Service List and choose Object Storage Service to open the OBS management console.

    2. Create a bucket.

      For details about how to create an OBS bucket, see Creating a Bucket in Getting Started in Object Storage Service.

      For example, create two buckets named mybucket and mybucket02.

      Ensure that the two buckets are in the same region as the GaussDB(DWS) cluster. This practice uses the EU-Dublin region as an example.

    3. Create a folder.

      For details, see "Creating a Folder" in the Object Storage Service Usage Guide

      Examples:

      • Create a folder named input_data in the mybucket OBS bucket.
      • Create a folder named input_data in the mybucket02 OBS bucket.
    4. Upload the files.

      For details, see "Uploading a File" in the Object Storage Service Usage Guide.

      Examples:

      • Upload the following data files to the input_data folder in the mybucket OBS bucket:
        1
        2
        product_info0.csv
        product_info1.csv
        
      • Upload the following data file to the input_data folder in the mybucket02 OBS bucket:
        1
        product_info2.csv
        

  2. Grant the OBS bucket read permission for the user who will import data.

    When importing data from OBS to a cluster, the user must have the read permission for the OBS buckets where the source data files are located. You can configure the ACL for the OBS buckets to grant the read permission to a specific user.

Creating a Foreign Table

  1. Connect to the GaussDB(DWS) database.
  2. Create a foreign table.

    • ACCESS_KEY and SECRET_ACCESS_KEY

      These parameters specify the AK and SK used to access OBS by a user. Replace them with the actual AK and SK.

      To obtain an access key, log in to the management console, move the cursor to the username in the upper right corner, click My Credential, and click Access Keys in the navigation pane on the left. On the Access Keys page, you can view the existing access key IDs (AKs). To obtain both the AK and SK, click Create Access Key to create and download an access key.

    • // Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    DROP FOREIGN TABLE IF EXISTS product_info_ext;
    CREATE FOREIGN TABLE product_info_ext
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    SERVER gsmpp_server 
    OPTIONS(
    LOCATION 'obs://mybucket/input_data/product_info | obs://mybucket02/input_data/product_info',
    FORMAT 'CSV' ,
    DELIMITER ',',
    ENCODING 'utf8',
    HEADER 'false',
    ACCESS_KEY 'access_key_value_to_be_replaced',
    SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
    FILL_MISSING_FIELDS 'true',
    IGNORE_EXTRA_DATA 'true'
    )
    READ ONLY 
    LOG INTO product_info_err 
    PER NODE REJECT LIMIT 'unlimited';
    
    If the following information is displayed, the foreign table has been created:
    1
    CREATE FOREIGN TABLE
    

Importing Data

  1. Create a table named product_info in the GaussDB(DWS) database to store the data imported from OBS.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    DROP TABLE IF EXISTS product_info;
    CREATE TABLE product_info
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    WITH (
    orientation = column,
    compression=middle
    ) 
    DISTRIBUTE BY hash (product_id);
    

  2. Run INSERT to import data from OBS to the target table product_info through the foreign table product_info_ext.

    1
    INSERT INTO product_info SELECT * FROM product_info_ext;
    

  3. Run SELECT to view the data imported from OBS to GaussDB(DWS).

    1
    SELECT * FROM product_info;
    

    The following information is displayed at the end of the query result:

    (20 rows)

  1. Run VACUUM FULL on the product_info table.

    1
    VACUUM FULL product_info;
    

  2. Update statistics of the product_info table.

    1
    ANALYZE product_info;
    

Deleting Resources

  1. If you have performed queries after importing data, run the following statement to delete the target table:

    1
    DROP TABLE product_info;
    

    If the following output is displayed, the foreign table has been deleted:

    DROP TABLE

  2. Run the following statement to delete the foreign table:

    1
    DROP FOREIGN TABLE product_info_ext;
    

    If the following output is displayed, the foreign table has been deleted:

    DROP FOREIGN TABLE