Step 4: Creating Another Table and Loading Data
After selecting a storage mode, compression level, distribution mode, and distribution key for each table, use these attributes to create tables and reload data. Compare the system performance before and after the table recreation.
- Delete the tables created before.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
DROP TABLE store_sales; DROP TABLE date_dim; DROP TABLE store; DROP TABLE item; DROP TABLE time_dim; DROP TABLE promotion; DROP TABLE customer_demographics; DROP TABLE customer_address; DROP TABLE household_demographics; DROP TABLE customer; DROP TABLE income_band; DROP FOREIGN TABLE obs_from_store_sales_001; DROP FOREIGN TABLE obs_from_date_dim_001; DROP FOREIGN TABLE obs_from_store_001; DROP FOREIGN TABLE obs_from_item_001; DROP FOREIGN TABLE obs_from_time_dim_001; DROP FOREIGN TABLE obs_from_promotion_001; DROP FOREIGN TABLE obs_from_customer_demographics_001; DROP FOREIGN TABLE obs_from_customer_address_001; DROP FOREIGN TABLE obs_from_household_demographics_001; DROP FOREIGN TABLE obs_from_customer_001; DROP FOREIGN TABLE obs_from_income_band_001;
- Create tables and specify storage and distribution modes for them.
Only the syntax for recreating the store_sales table is provided for simplicity. To recreate all the other tables, copy the syntax in Creating a Another Table After Design Optimization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
CREATE TABLE store_sales ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) WITH (ORIENTATION = column,COMPRESSION=middle) DISTRIBUTE BY hash (ss_item_sk);
- Load sample data into these tables.
- Record the loading time in the benchmark tables.
Benchmark
Before
After
Loading time (11 tables)
341584 ms
257241 ms
Occupied storage space
Store_Sales
42 GB
-
Date_Dim
11 MB
-
Store
232 KB
-
Item
110 MB
-
Time_Dim
11 MB
-
Promotion
256 KB
-
Customer_Demographics
171 MB
-
Customer_Address
170 MB
-
Household_Demographics
504 KB
-
Customer
441 MB
-
Income_Band
88 KB
-
Total storage space
42 GB
-
Query execution time
Query 1
14552.05 ms
-
Query 2
27952.36 ms
-
Query 3
17721.15 ms
-
Total execution time
60225.56 ms
-
- Run the ANALYZE command to update statistics.
1
ANALYZE;
If ANALYZE is returned, the execution is successful.
1
ANALYZE
- Check for data skew.
For a hash table, an improper distribution key may cause data skew or poor I/O performance on certain DNs. Therefore, you need to check the table to ensure that data is evenly distributed on each DN. You can run the following SQL statements to check for data skew:
1
SELECT a.count,b.node_name FROM (SELECT count(*) AS count,xc_node_id FROM table_name GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count desc;
xc_node_id corresponds to a DN. Generally, over 5% difference between the amount of data on different DNs is regarded as data skew. If the difference is over 10%, choose another distribution key. In GaussDB(DWS), you can select multiple distribution keys to distribute data evenly.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.