Updated on 2024-06-07 GMT+08:00

Step 3: Optimizing a Table

Selecting a Storage Mode

Sample tables used in this practice are typical multi-column TPC-DS tables where many statistical analysis queries are performed. Therefore, the column storage mode is recommended.

1
WITH (ORIENTATION = column)

Selecting a Compression Level

No compression ratio is specified in Step 1: Creating an Initial Table and Loading Sample Data, and the low compression ratio is selected by GaussDB(DWS) by default. Specify COMPRESSION to MIDDLE, and compare the result to that when COMPRESSION is set to LOW.

The following is an example of selecting a storage mode and the MIDDLE compression ratio for a table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
CREATE TABLE store_sales
(
    ss_sold_date_sk           integer                       ,
    ss_sold_time_sk           integer                       ,
    ss_item_sk                integer               not null,
    ss_customer_sk            integer                       ,
    ss_cdemo_sk               integer                       ,
    ss_hdemo_sk               integer                       ,
    ss_addr_sk                integer                       ,
    ss_store_sk               integer                       ,
    ss_promo_sk               integer                       ,
    ss_ticket_number          bigint               not null,
    ss_quantity               integer                       ,
    ss_wholesale_cost         decimal(7,2)                  ,
    ss_list_price             decimal(7,2)                  ,
    ss_sales_price            decimal(7,2)                  ,
    ss_ext_discount_amt       decimal(7,2)                  ,
    ss_ext_sales_price        decimal(7,2)                  ,
    ss_ext_wholesale_cost     decimal(7,2)                  ,
    ss_ext_list_price         decimal(7,2)                  ,
    ss_ext_tax                decimal(7,2)                  ,
    ss_coupon_amt             decimal(7,2)                  ,
    ss_net_paid               decimal(7,2)                  ,
    ss_net_paid_inc_tax       decimal(7,2)                  ,
    ss_net_profit             decimal(7,2)                  
) 
WITH (ORIENTATION = column,COMPRESSION=middle);

Selecting a Distribution Mode

Based on table sizes provided in Step 2: Testing System Performance of the Initial Table and Establishing a Baseline, set the distribution mode as follows.

Table Name

Number of Rows

Distribution Mode

Store_Sales

287997024

Hash

Date_Dim

73049

Replication

Store

402

Replication

Item

204000

Replication

Time_Dim

86400

Replication

Promotion

1000

Replication

Customer_Demographics

1920800

Hash

Customer_Address

1000000

Hash

Household_Demographics

7200

Replication

Customer

1981703

Hash

Income_Band

20

Replication

Selecting a Distribution Key

If your table is distributed using hash, choose a proper distribution key. You are advised to select a distribution key according to Selecting a Distribution Key.

Select the primary key of each table as the distribution key of the hash table.

Table Name

Number of Records

Distribution Mode

Distribution Key

Store_Sales

287997024

Hash

ss_item_sk

Date_Dim

73049

Replication

-

Store

402

Replication

-

Item

204000

Replication

-

Time_Dim

86400

Replication

-

Promotion

1000

Replication

-

Customer_Demographics

1920800

Hash

cd_demo_sk

Customer_Address

1000000

Hash

ca_address_sk

Household_Demographics

7200

Replication

-

Customer

1981703

Hash

c_customer_sk

Income_Band

20

Replication

-