Step 3: Optimizing a Table
Selecting a Storage Mode
Sample tables used in this practice are typical multi-column TPC-DS tables where many statistical analysis queries are performed. Therefore, the column storage mode is recommended.
1
|
WITH (ORIENTATION = column) |
Selecting a Compression Level
No compression ratio is specified in Step 1: Creating an Initial Table and Loading Sample Data, and the low compression ratio is selected by GaussDB(DWS) by default. Specify COMPRESSION to MIDDLE, and compare the result to that when COMPRESSION is set to LOW.
The following is an example of selecting a storage mode and the MIDDLE compression ratio for a table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
CREATE TABLE store_sales ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) WITH (ORIENTATION = column,COMPRESSION=middle); |
Selecting a Distribution Mode
Based on table sizes provided in Step 2: Testing System Performance of the Initial Table and Establishing a Baseline, set the distribution mode as follows.
Table Name |
Number of Rows |
Distribution Mode |
---|---|---|
Store_Sales |
287997024 |
Hash |
Date_Dim |
73049 |
Replication |
Store |
402 |
Replication |
Item |
204000 |
Replication |
Time_Dim |
86400 |
Replication |
Promotion |
1000 |
Replication |
Customer_Demographics |
1920800 |
Hash |
Customer_Address |
1000000 |
Hash |
Household_Demographics |
7200 |
Replication |
Customer |
1981703 |
Hash |
Income_Band |
20 |
Replication |
Selecting a Distribution Key
If your table is distributed using hash, choose a proper distribution key. You are advised to select a distribution key according to Selecting a Distribution Key.
Select the primary key of each table as the distribution key of the hash table.
Table Name |
Number of Records |
Distribution Mode |
Distribution Key |
---|---|---|---|
Store_Sales |
287997024 |
Hash |
ss_item_sk |
Date_Dim |
73049 |
Replication |
- |
Store |
402 |
Replication |
- |
Item |
204000 |
Replication |
- |
Time_Dim |
86400 |
Replication |
- |
Promotion |
1000 |
Replication |
- |
Customer_Demographics |
1920800 |
Hash |
cd_demo_sk |
Customer_Address |
1000000 |
Hash |
ca_address_sk |
Household_Demographics |
7200 |
Replication |
- |
Customer |
1981703 |
Hash |
c_customer_sk |
Income_Band |
20 |
Replication |
- |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.