Step 1: Creating an Initial Table and Loading Sample Data
Supported Regions
Table 1 describes the regions where OBS data has been uploaded.
Region |
OBS Bucket |
---|---|
CN North-Beijing1 |
dws-demo-cn-north-1 |
CN North-Beijing2 |
dws-demo-cn-north-2 |
CN North-Beijing4 |
dws-demo-cn-north-4 |
CN North-Ulanqab1 |
dws-demo-cn-north-9 |
CN East-Shanghai1 |
dws-demo-cn-east-3 |
CN East-Shanghai2 |
dws-demo-cn-east-2 |
CN South-Guangzhou |
dws-demo-cn-south-1 |
CN South-Guangzhou-InvitationOnly |
dws-demo-cn-south-4 |
CN-Hong Kong |
dws-demo-ap-southeast-1 |
AP-Singapore |
dws-demo-ap-southeast-3 |
AP-Bangkok |
dws-demo-ap-southeast-2 |
LA-Santiago |
dws-demo-la-south-2 |
AF-Johannesburg |
dws-demo-af-south-1 |
LA-Mexico City1 |
dws-demo-na-mexico-1 |
LA-Mexico City2 |
dws-demo-la-north-2 |
RU-Moscow2 |
dws-demo-ru-northwest-2 |
LA-Sao Paulo1 |
dws-demo-sa-brazil-1 |
Create a group of tables without specifying their storage modes, distribution keys, distribution modes, or compression modes. Load sample data into these tables.
- (Optional) Create a cluster.
If a cluster is available, skip this step. For how to create a cluster, see Creating a DWS 2.0 Cluster.
Furthermore, connect to the cluster and test the connection. For details, see Methods of Connecting to a Cluster.
This practice uses an 8-node cluster as an example. You can also use a four-node cluster to perform the test.
- Create an SS test table store_sales.
If SS tables already exist in the current database, run the DROP TABLE statement to delete these tables first.
For example, delete the store_sales table.
1
DROP TABLE store_sales;
Do not configure the storage mode, distribution key, distribution mode, or compression mode when you create this table.
Run the CREATE TABLE command to create the 11 tables in Figure 3. This section only provides the syntax for creating the store_sales table. To create all tables, copy the syntax in Creating an Initial Table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
CREATE TABLE store_sales ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) ;
- Load sample data into these tables.
An OBS bucket provides sample data used for this practice. The bucket can be read by all authenticated cloud users. Perform the following operations to load the sample data:
- Create a foreign table for each table.
GaussDB(DWS) uses the foreign data wrappers (FDWs) provided by PostgreSQL to import data in parallel. To use FDWs, create FDW tables first (also called foreign tables). This section only provides the syntax for creating the obs_from_store_sales_001 foreign table corresponding to the store_sales table. To create all foreign tables, copy the syntax in Creating a Foreign Table.
- Note that <obs_bucket_name> in the following statement indicates the OBS bucket name. Only some regions are supported. For details about the supported regions and OBS bucket names, see Table 1. GaussDB(DWS) clusters do not support cross-region access to OBS bucket data.
- The columns of the foreign table must be the same as that of the corresponding ordinary table. In this example, store_sales and obs_from_store_sales_001 should have the same columns.
- The foreign table syntax obtains the sample data used for this practice from the OBS bucket. To load other sample data, modify SERVER gsmpp_server OPTIONS as needed. For details, see About Parallel Data Import from OBS.
- Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
CREATE FOREIGN TABLE obs_from_store_sales_001 ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) -- Configure OBS server information and data format details. SERVER gsmpp_server OPTIONS ( LOCATION 'obs://<obs_bucket_name>/tpcds/store_sales', FORMAT 'text', DELIMITER '|', ENCODING 'utf8', NOESCAPING 'true', ACCESS_KEY 'access_key_value_to_be_replaced', SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced', REJECT_LIMIT 'unlimited', CHUNKSIZE '64' ) -- If create foreign table failed,record error message WITH err_obs_from_store_sales_001;
- Set ACCESS_KEY and SECRET_ACCESS_KEY parameters as needed in the foreign table creation statement, and run this statement in a client tool to create a foreign table.
For the values of ACCESS_KEY and SECRET_ACCESS_KEY, see Creating Access Keys (AK and SK).
- Import data.
Create the insert.sql script containing the following statements and execute it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
\timing on \parallel on 4 INSERT INTO store_sales SELECT * FROM obs_from_store_sales_001; INSERT INTO date_dim SELECT * FROM obs_from_date_dim_001; INSERT INTO store SELECT * FROM obs_from_store_001; INSERT INTO item SELECT * FROM obs_from_item_001; INSERT INTO time_dim SELECT * FROM obs_from_time_dim_001; INSERT INTO promotion SELECT * FROM obs_from_promotion_001; INSERT INTO customer_demographics SELECT * from obs_from_customer_demographics_001 ; INSERT INTO customer_address SELECT * FROM obs_from_customer_address_001 ; INSERT INTO household_demographics SELECT * FROM obs_from_household_demographics_001; INSERT INTO customer SELECT * FROM obs_from_customer_001; INSERT INTO income_band SELECT * FROM obs_from_income_band_001; \parallel off
The returned result is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
SET Timing is on. SET Time: 2.831 ms Parallel is on with scale 4. Parallel is off. INSERT 0 402 Time: 1820.909 ms INSERT 0 73049 Time: 2715.275 ms INSERT 0 86400 Time: 2377.056 ms INSERT 0 1000 Time: 4037.155 ms INSERT 0 204000 Time: 7124.190 ms INSERT 0 7200 Time: 2227.776 ms INSERT 0 1920800 Time: 8672.647 ms INSERT 0 20 Time: 2273.501 ms INSERT 0 1000000 Time: 11430.991 ms INSERT 0 1981703 Time: 20270.750 ms INSERT 0 287997024 Time: 341395.680 ms total time: 341584 ms
- Calculate the total time spent in creating the 11 tables. The result will be recorded as the loading time in the benchmark table in 1 in the next section.
- Run the following command to verify that each table is loaded correctly and records lines into the table:
1 2 3 4 5 6 7 8 9 10 11
SELECT COUNT(*) FROM store_sales; SELECT COUNT(*) FROM date_dim; SELECT COUNT(*) FROM store; SELECT COUNT(*) FROM item; SELECT COUNT(*) FROM time_dim; SELECT COUNT(*) FROM promotion; SELECT COUNT(*) FROM customer_demographics; SELECT COUNT(*) FROM customer_address; SELECT COUNT(*) FROM household_demographics; SELECT COUNT(*) FROM customer; SELECT COUNT(*) FROM income_band;
The number of rows in each SS table is as follows:
Table name
Number of Rows
Store_Sales
287997024
Date_Dim
73049
Store
402
Item
204000
Time_Dim
86400
Promotion
1000
Customer_Demographics
1920800
Customer_Address
1000000
Household_Demographics
7200
Customer
1981703
Income_Band
20
- Create a foreign table for each table.
- Run the ANALYZE command to update statistics.
1
ANALYZE;
If ANALYZE is returned, the execution is successful.
1
ANALYZE
The ANALYZE statement collects statistics about table content in databases, which will be stored in the PG_STATISTIC system catalog. Then, the query optimizer uses the statistics to work out the most efficient execution plan.
After executing batch insertions and deletions, you are advised to run the ANALYZE statement on the table or the entire library to update statistics.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot