Transferring Data Between OBS Buckets and GaussDB(DWS) Clusters
This guide shows how to upload sample data to OBS and access it through an OBS foreign table. You can import OBS data into GaussDB(DWS) or export data from GaussDB(DWS) to OBS.
- Data can be imported to OBS in TXT, CSV, ORC, PARQUET, CARBONDATA, or JSON formats.
- Data can be exported from OBS in the following formats:
- ORC, CARBONDATA, and PARQUET: You will need to create a foreign server. For how to create a foreign server, see Step 2: Create an OBS Foreign Server and a Foreign Table. This example exports data of this type.
- TXT and CSV: No foreign server is required as the system provides the gsmpp_server server by default. For how to export data in this format, see Exporting CSV/TXT Data to OBS.
OBS is an object-based storage service that offers secure, reliable, and cost-effective data storage for various users, websites, enterprises, and developers. You can use OBS Console or OBS Browser to access and manage data stored on OBS from any computer connected to the Internet. For details, see Object Storage Service Documentation.
This practice takes approximately 1 hour. The basic procedure is as follows:
- Preparations: Create a GaussDB(DWS) cluster and an OBS bucket, and obtain the AK and SK.
- Step 1: Prepare OBS Data: Upload the preset sample data to the OBS bucket.
- Step 2: Create an OBS Foreign Server and a Foreign Table: Prepare for importing data from the OBS bucket.
- Step 3: Access and Import OBS Bucket Data to a GaussDB(DWS) Cluster: Import data using an OBS foreign table.
- Step 4: Export Data from a GaussDB(DWS) Table to an OBS Bucket: Verify that data cannot be updated or deleted in a write-only foreign table after data export.
Preparations
- Create a GaussDB(DWS) cluster. For details, see Creating a Dedicated GaussDB(DWS) Cluster
- Create an OBS bucket in the same region as GaussDB(DWS). You can name the bucket obs-demo01. If 01 is taken, use obs-demo02. Follow this pattern. For details, see Object Storage Service Documentation.
- Obtain the AK and SK of the account for accessing data in the OBS bucket. For details, see Access Keys
Step 1: Prepare OBS Data
- Download the data sample file.
- Log in to OBS Console and click the dws-demo01 bucket in the bucket list.
- Select Objects on the left, click Create Folder, and name the folder obs-dws.
- Go to the obs-dws folder, click Upload Object, and upload the sample file downloaded in 1 to the obs-dws folder.
- Obtain the OBS endpoint.
- Go back to the obs-dwst page and click Overview on the left.
- Record the endpoint from the Domain Name Details list, for example, obs.ap-southeast-1.myhuaweicloud.com.
Step 2: Create an OBS Foreign Server and a Foreign Table
A foreign server is a virtual link that helps organize and control external data sources like databases and file systems in a database system or data warehouse. It enables unified access to diverse distributed data and plays a key role in data integration, real-time analysis, and data virtualization.
To access OBS bucket data through a foreign server, set up a foreign OBS server and provide the endpoint, Access Key (AK), and Secret Key (SK) for OBS.
- Run the following SQL statement to create a foreign server after connecting to the database.
Set ADDRESS to the address obtained in 5, and set ACCESS_KEY and SECRET_ACCESS_KEY to the AK and SK obtained in Preparations.
1 2 3 4 5 6 7
CREATE SERVER obs_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( ADDRESS 'obs.aaaaa.bbbbb.com', ACCESS_KEY 'xxxxxxxxx', SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', TYPE 'OBS' );
- Run the following SQL statement to create a schema named dws_data:
1
CREATE SCHEMA dws_data;
- Switch to the newly created schema and create a foreign table.
Replace '/obs-demo01/obs-dws/' with your actual OBS path where the data files are stored. Make sure the OBS bucket and the GaussDB(DWS) cluster are in the same region. In this example, the OBS path is the obs-dws folder in the obs-demo01 bucket.
SERVER obs_server indicates the name of the foreign server created in 1, for example, obs_server.1 2 3 4 5 6 7 8 9 10 11 12 13 14
CREATE FOREIGN TABLE dws_data.obs_pq_order ( order_idVARCHAR(14)PRIMARY KEY NOT ENFORCED, order_channel VARCHAR(32), order_timeTIMESTAMP, cust_codeVARCHAR(6), pay_amountDOUBLE PRECISION, real_payDOUBLE PRECISION ) SERVER obs_server OPTIONS ( foldername '/obs-demo01/obs-dws/', format 'parquet', encoding 'utf8' ) READ ONLY DISTRIBUTE BY roundrobin;
Step 3: Access and Import OBS Bucket Data to a GaussDB(DWS) Cluster
- Use the foreign table created in the preceding steps to directly access data in the OBS bucket.
1
SELECT * FROM dws_data.obs_pq_order;
- Execute the SELECT statement with conditions through a foreign table.
1 2 3 4 5 6 7 8 9
SELECT COUNT(*) FROM dws_data.obs_pq_order; SELECT order_id, order_channel, order_time, cust_code FROM dws_data.obs_pq_order; SELECT TO_CHAR(order_time, 'Month, YYYY') AS order_month, cust_code, COUNT(*) AS order_cnt FROM dws_data.obs_pq_order WHERE DATE_PART('Year', order_time) = 2023 GROUP BY TO_CHAR(order_time, 'Month, YYYY'), cust_code HAVING COUNT(*) >= 10;
- Create a local table to import data to a GaussDB(DWS) cluster through an OBS foreign table.
1 2 3 4 5 6
CREATE TABLE dws_data.dws_monthly_order ( order_monthCHAR(8), cust_codeVARCHAR(6), order_countINT, total_pay_amountDOUBLE PRECISION, total_real_payDOUBLE PRECISION );
- Use the OBS foreign table data to calculate the monthly order details of 2023 and import the result to the local table of the GaussDB(DWS) cluster.
1 2 3 4 5 6 7 8
INSERT INTO dws_data.dws_monthly_order ( order_month, cust_code, order_count , total_pay_amount, total_real_pay ) SELECT TO_CHAR(order_time, 'MON-YYYY'), cust_code, COUNT(*) , SUM(pay_amount), SUM(real_pay) FROM dws_data.obs_pq_order WHERE DATE_PART('Year', order_time) = 2023 GROUP BY TO_CHAR(order_time, 'MON-YYYY'), cust_code;
- Check the table data import status.
1
SELECT * FROM dws_data.dws_monthly_order;
Step 4: Export Data from a GaussDB(DWS) Table to an OBS Bucket
- Create a local table.
1 2 3 4 5 6 7
CREATE TABLE dws_data.dws_order ( order_idVARCHAR(14)PRIMARY KEY, order_channel VARCHAR(32), order_timeTIMESTAMP, cust_codeVARCHAR(6), pay_amountDOUBLE PRECISION, real_payDOUBLE PRECISION );
- Insert three data records.
1 2 3 4
INSERT INTO dws_data.dws_order VALUES ('20230627000001', 'webShop', TIMESTAMP '2023-06-27 10:00:00', 'CUST1', 1000, 1000) , ('20230627000002', 'webShop', TIMESTAMP '2023-06-27 11:00:00', 'CUST2', 5000, 5000) , ('20240309000003', 'webShop', TIMESTAMP '2024-03-09 13:00:00', 'CUST1', 2000, 2000);
- Create a foreign table for exporting data to an OBS bucket.
Replace '/obs-demo01/obs-dws/' with your actual OBS bucket path.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
CREATE FOREIGN TABLE dws_data.obs_orc_order ( order_idVARCHAR(14)PRIMARY KEY NOT ENFORCED, order_channel VARCHAR(32), order_timeTIMESTAMP, cust_codeVARCHAR(6), pay_amountDOUBLE PRECISION, real_payDOUBLE PRECISION ) SERVER obs_server OPTIONS ( foldername '/obs-demo01/obs-dws/', format 'ORC', encoding 'utf8' ) WRITE ONLY DISTRIBUTE BY roundrobin;
- Write the local table data to the OBS foreign table.
1 2 3
INSERT INTO dws_data.obs_orc_order ( order_id, order_channel, order_time, cust_code, pay_amount, real_pay ) SELECT order_id, order_channel, order_time, cust_code, pay_amount, real_pay FROM dws_data.dws_order;
- Query whether the data import is successful.
1
SELECT * FROM dws_data.obs_orc_order;
- Log in to the OBS console and check whether the data file exists in the path of the OBS bucket.
- Check whether extra data can be inserted into the same foreign table.
1 2 3
INSERT INTO dws_data.obs_orc_order ( order_id, order_channel, order_time, cust_code, pay_amount, real_pay ) SELECT order_id, order_channel, order_time, cust_code, pay_amount, real_pay FROM dws_data.dws_order;
The error "the file path specified in the foreign table is not empty" appears. If you delete the data file in the OBS file path, you can insert data again.
- Check whether foreign table data can be updated and deleted.
1 2
UPDATE dws_data.obs_orc_order SET pay_amount = 3000, real_pay = 3000 WHERE order_id = '20240309000003'; DELETE FROM dws_data.obs_orc_order WHERE order_id = '20240309000003';
According to the command output, foreign table data cannot be updated or deleted.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot