Step 1: Creating an Initial Table and Loading Sample Data
Create a group of tables without specifying their storage modes, distribution keys, distribution modes, or compression modes. Load sample data into these tables.
- (Optional) Create a cluster.
If a cluster is available, skip this step. For details about how to create a cluster, see Getting Started. Then, follow the instructions provided in Service Overview to connect to the cluster using an SQL client and test the connection.
This tutorial uses an 8-node cluster as an example. You can also use a four-node cluster to perform the test.
- Create an SS test table store_sales.
Before you create this table, delete existing SS tables first (if any) using the DROP TABLE command. For example, to delete the store_sales table, run the following command:
1
DROP TABLE store_sales;
Do not set the storage mode, distribution key, distribution mode, or compression mode when you create this table.
Run the CREATE TABLE command to create the 11 tables in Figure 3. This section only provides the syntax for creating the store_sales table. To create all tables, copy the syntax in Creating an Initial Table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
CREATE TABLE store_sales ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) ;
- Load sample data into these tables.
An OBS bucket provides sample data used for this tutorial. The bucket can be read by all authenticated cloud users. Perform the following substeps to load the sample data:
- Create a foreign table for each table.
GaussDB(DWS) uses the Foreign Data Wrapper (FDW) provided by PostgreSQL to import data in parallel. To use FDW, you need to create FDW tables, also called foreign tables. This section only provides the syntax for creating the obs_from_store_sales_001 foreign table corresponding to the store_sales table. To create all foreign tables, copy the syntax in Creating a Foreign Table.
- The columns of the foreign tables must be the same as that of the corresponding ordinary table. In this example, store_sales and obs_from_store_sales_ 001 should have the same columns.
- The foreign table syntax obtains the sample data used for this tutorial from the OBS bucket. To load other sample data, modify SERVER gsmpp_server OPTIONS as needed. For details, see About Parallel Data Import from OBS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
CREATE FOREIGN TABLE obs_from_store_sales_001 ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) -- Configure OBS server information and data format details. SERVER gsmpp_server OPTIONS ( LOCATION obs.cn-north-1.myhuaweicloud.comstore_sales/store_sales', FORMAT 'text', DELIMITER '|', ENCODING 'utf8', NOESCAPING 'true', ACCESS_KEY 'access_key_value_to_be_replaced', SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced', REJECT_LIMIT 'unlimited', CHUNKSIZE '64' ) -- If create foreign table failed,record error message WITH err_obs_from_store_sales_001;
- Set ACCESS_KEY and SECRET_ACCESS_KEY parameters as needed in the foreign table creation statement, and run this statement in a client tool to create a foreign table.
For the values of ACCESS_KEY and SECRET_ACCESS_KEY, see Creating Access Keys (AK and SK) of this document.
- Import data. Create the insert.sql script containing the following statements and execute it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
\timing on \parallel on 4 INSERT INTO store_sales SELECT * FROM obs_from_store_sales_001; INSERT INTO date_dim SELECT * FROM obs_from_date_dim_001; INSERT INTO store SELECT * FROM obs_from_store_001; INSERT INTO item SELECT * FROM obs_from_item_001; INSERT INTO time_dim SELECT * FROM obs_from_time_dim_001; INSERT INTO promotion SELECT * FROM obs_from_promotion_001; INSERT INTO customer_demographics SELECT * from obs_from_customer_demographics_001 ; INSERT INTO customer_address SELECT * FROM obs_from_customer_address_001 ; INSERT INTO household_demographics SELECT * FROM obs_from_household_demographics_001; INSERT INTO customer SELECT * FROM obs_from_customer_001; INSERT INTO income_band SELECT * FROM obs_from_income_band_001; \parallel off
Information similar to the following is displayed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
SET Timing is on. SET Time: 2.831 ms Parallel is on with scale 4. Parallel is off. INSERT 0 402 Time: 1820.909 ms INSERT 0 73049 Time: 2715.275 ms INSERT 0 86400 Time: 2377.056 ms INSERT 0 1000 Time: 4037.155 ms INSERT 0 204000 Time: 7124.190 ms INSERT 0 7200 Time: 2227.776 ms INSERT 0 1920800 Time: 8672.647 ms INSERT 0 20 Time: 2273.501 ms INSERT 0 1000000 Time: 11430.991 ms INSERT 0 1981703 Time: 20270.750 ms INSERT 0 287997024 Time: 341395.680 ms total time: 341584 ms
- Calculate the total time spent in creating the 11 tables. The result will be recorded as the loading time in the benchmark table in 1 in the next section.
- Run the following command to verify that each table is loaded correctly and records lines into the table:
1 2 3 4 5 6 7 8 9 10 11
SELECT COUNT(*) FROM store_sales; SELECT COUNT(*) FROM date_dim; SELECT COUNT(*) FROM store; SELECT COUNT(*) FROM item; SELECT COUNT(*) FROM time_dim; SELECT COUNT(*) FROM promotion; SELECT COUNT(*) FROM customer_demographics; SELECT COUNT(*) FROM customer_address; SELECT COUNT(*) FROM household_demographics; SELECT COUNT(*) FROM customer; SELECT COUNT(*) FROM income_band;
The number of rows in each SS table is as follows:
Table name
Number of Rows
Store_Sales
287997024
Date_Dim
73049
Store
402
Item
204000
Time_Dim
86400
Promotion
1000
Customer_Demographics
1920800
Customer_Address
1000000
Household_Demographics
7200
Customer
1981703
Income_Band
20
- Create a foreign table for each table.
- Run the ANALYZE command to update statistics.
1
ANALYZE;
If ANALYZE is returned, the execution is successful.
1
ANALYZEThe ANALYZE statement collects statistics about table content in databases, which will be stored in the PG_STATISTIC system catalog. Then, the query optimizer uses the statistics to work out the most efficient execution plan.
After executing batch insertions and deletions, you are advised to run the ANALYZE statement on the table or the entire library to update statistics.
Last Article: Table Schemas
Next Article: Step 2: Testing System Performance of the Initial Table and Establishing a Baseline
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.