このページは、お客様の言語ではご利用いただけません。Huawei Cloudは、より多くの言語バージョンを追加するために懸命に取り組んでいます。ご協力ありがとうございました。

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ Best Practices/ Import and Export/ Migrating Data from OBS Buckets to a GaussDB(DWS) Cluster

Migrating Data from OBS Buckets to a GaussDB(DWS) Cluster

Updated on 2024-10-29 GMT+08:00

Overview

This practice demonstrates how to upload sample data to OBS and import OBS data to the target table on GaussDB(DWS), helping you quickly learn how to import data from OBS to a GaussDB(DWS) cluster.

You can import data in TXT, CSV, ORC, PARQUET, CARBONDATA, or JSON format from OBS to a GaussDB(DWS) cluster for query.

This tutorial uses the CSV format as an example to perform the following operations:

  • Generate data files in CSV format.
  • Create an OBS bucket in the same region as the GaussDB(DWS) cluster, and upload data files to the OBS bucket.
  • Create a foreign table to import data from the OBS bucket to GaussDB(DWS) clusters.
  • Start GaussDB(DWS), create a table, and import data from OBS to the table.
  • Analyze import errors based on the information in the error table and correct these errors.

Estimated time: 30 minutes

Preparing Source Data Files

  • Data file product_info0.csv
    1
    2
    3
    4
    5
    100,XHDK-A,2017-09-01,A,2017 Shirt Women,red,M,328,2017-09-04,715,good!
    205,KDKE-B,2017-09-01,A,2017 T-shirt Women,pink,L,584,2017-09-05,40,very good!
    300,JODL-X,2017-09-01,A,2017 T-shirt men,red,XL,15,2017-09-03,502,Bad.
    310,QQPX-R,2017-09-02,B,2017 jacket women,red,L,411,2017-09-05,436,It's nice.
    150,ABEF-C,2017-09-03,B,2017 Jeans Women,blue,M,123,2017-09-06,120,good.
    
  • Data file product_info1.csv
    1
    2
    3
    4
    5
    200,BCQP-E,2017-09-04,B,2017 casual pants men,black,L,997,2017-09-10,301,good quality.
    250,EABE-D,2017-09-10,A,2017 dress women,black,S,841,2017-09-15,299,This dress fits well.
    108,CDXK-F,2017-09-11,A,2017 dress women,red,M,85,2017-09-14,22,It's really amazing to buy.
    450,MMCE-H,2017-09-11,A,2017 jacket women,white,M,114,2017-09-14,22,very good.
    260,OCDA-G,2017-09-12,B,2017 woolen coat women,red,L,2004,2017-09-15,826,Very comfortable.
    
  • Data file product_info2.csv
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    980,"ZKDS-J",2017-09-13,"B","2017 Women's Cotton Clothing","red","M",112,,,
    98,"FKQB-I",2017-09-15,"B","2017 new shoes men","red","M",4345,2017-09-18,5473
    50,"DMQY-K",2017-09-21,"A","2017 pants men","red","37",28,2017-09-25,58,"good","good","good"
    80,"GKLW-l",2017-09-22,"A","2017 Jeans Men","red","39",58,2017-09-25,72,"Very comfortable."
    30,"HWEC-L",2017-09-23,"A","2017 shoes women","red","M",403,2017-09-26,607,"good!"
    40,"IQPD-M",2017-09-24,"B","2017 new pants Women","red","M",35,2017-09-27,52,"very good."
    50,"LPEC-N",2017-09-25,"B","2017 dress Women","red","M",29,2017-09-28,47,"not good at all."
    60,"NQAB-O",2017-09-26,"B","2017 jacket women","red","S",69,2017-09-29,70,"It's beautiful."
    70,"HWNB-P",2017-09-27,"B","2017 jacket women","red","L",30,2017-09-30,55,"I like it so much"
    80,"JKHU-Q",2017-09-29,"C","2017 T-shirt","red","M",90,2017-10-02,82,"very good."
    
  1. Create a text file, open it using a local editing tool (for example, Visual Studio Code), and copy the sample data to the text file.
  2. Choose Format > Encode in UTF-8 without BOM.
  3. Choose File > Save as.
  4. In the displayed dialog box, enter the file name, set the file name extension to .csv, and click Save.

Uploading Data to OBS

  1. Store the three CSV source data files in the OBS bucket.

    1. Log in to the OBS management console.

      Click Service List and choose Object Storage Service to open the OBS management console.

    2. Create a bucket.

      For details about how to create an OBS bucket, see Getting Started > Creating a Bucket in Object Storage Service.

      For example, create two buckets named mybucket and mybucket02.

      NOTICE:

      Ensure that the two buckets and the GaussDB(DWS) cluster are in the same region. This practice uses the CN-Hong Kong region as an example.

    3. Create a folder.

      For details, see Creating a Folder in the Object Storage Service User Guide.

      Examples:

      • Create a folder named input_data in the mybucket OBS bucket.
      • Create a folder named input_data in the mybucket02 OBS bucket.
    4. Upload the files.

      For details, see For details, see Uploading an Object in the Object Storage Service Quick Start.

      Examples:

      • Upload the following data files to the input_data folder in the mybucket OBS bucket:
        1
        2
        product_info0.csv
        product_info1.csv
        
      • Upload the following data file to the input_data folder in the mybucket02 OBS bucket:
        1
        product_info2.csv
        

  2. Grant the OBS bucket read permission for the user who will import data.

    When importing data from OBS to a cluster, the user must have the read permission for the OBS buckets where the source data files are located. You can configure the ACL for the OBS buckets to grant the read permission to a specific user.

    For details, see Configuring a Bucket ACL in the Object Storage Service Console Operation Guide.

Creating a Foreign Table

  1. Connect to the GaussDB(DWS) database.
  2. Create a foreign table.

    NOTE:
    • ACCESS_KEY and SECRET_ACCESS_KEY

      These parameters specify the AK and SK used to access OBS by a user. Replace them with the actual AK and SK.

      To obtain an access key, log in to the management console, move the cursor to the username in the upper right corner, click My Credential, and click Access Keys in the navigation pane on the left. On the Access Keys page, you can view the existing access key IDs (AKs). To obtain both the AK and SK, click Create Access Key to create and download an access key.

    • // Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    DROP FOREIGN TABLE IF EXISTS product_info_ext;
    CREATE FOREIGN TABLE product_info_ext
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    SERVER gsmpp_server 
    OPTIONS(
    LOCATION 'obs://mybucket/input_data/product_info | obs://mybucket02/input_data/product_info',
    FORMAT 'CSV' ,
    DELIMITER ',',
    ENCODING 'utf8',
    HEADER 'false',
    ACCESS_KEY 'access_key_value_to_be_replaced',
    SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
    FILL_MISSING_FIELDS 'true',
    IGNORE_EXTRA_DATA 'true'
    )
    READ ONLY 
    LOG INTO product_info_err 
    PER NODE REJECT LIMIT 'unlimited';
    
    If the following information is displayed, the foreign table has been created:
    1
    CREATE FOREIGN TABLE
    

Importing Data

  1. Create a table named product_info in the GaussDB(DWS) database to store the data imported from OBS.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    DROP TABLE IF EXISTS product_info;
    CREATE TABLE product_info
    (
        product_price                integer        not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    integer        ,
        product_comment_time         date           ,
        product_comment_num          integer        ,
        product_comment_content      varchar(200)                   
    ) 
    WITH (
    orientation = column,
    compression=middle
    ) 
    DISTRIBUTE BY hash (product_id);
    

  2. Run INSERT to import data from OBS to the target table product_info through the foreign table product_info_ext.

    1
    INSERT INTO product_info SELECT * FROM product_info_ext;
    

  3. Run SELECT to view the data imported from OBS to GaussDB(DWS).

    1
    SELECT * FROM product_info;
    

    The following information is displayed at the end of the query result:

    (20 rows)

  1. Run VACUUM FULL on the product_info table.

    1
    VACUUM FULL product_info;
    

  2. Update statistics of the product_info table.

    1
    ANALYZE product_info;
    

Deleting Resources

  1. If you have performed queries after importing data, run the following statement to delete the target table:

    1
    DROP TABLE product_info;
    

    If the following output is displayed, the foreign table has been deleted:

    DROP TABLE

  2. Run the following statement to delete the foreign table:

    1
    DROP FOREIGN TABLE product_info_ext;
    

    If the following output is displayed, the foreign table has been deleted:

    DROP FOREIGN TABLE

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback