步骤3:调优表操作具体步骤
选择存储方式
此实践中所使用的样例表为典型的TPC-DS表,是典型的多字段表,统计分析类查询场景多,因此选择列存存储方式。
1 | WITH (ORIENTATION = column) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | CREATE TABLE store_sales ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) WITH (ORIENTATION = column,COMPRESSION=middle); |
选择分布方式
依据步骤2:测试初始表结构下的系统性能并建立基线中基线的各表大小,分布方式设置如下:
表名 | 行数 | 分布方式 |
|---|---|---|
Store_Sales | 287997024 | Hash |
Date_Dim | 73049 | Replication |
Store | 402 | Replication |
Item | 204000 | Replication |
Time_Dim | 86400 | Replication |
Promotion | 1000 | Replication |
Customer_Demographics | 1920800 | Hash |
Customer_Address | 1000000 | Hash |
Household_Demographics | 7200 | Replication |
Customer | 1981703 | Hash |
Income_Band | 20 | Replication |
选择分布列
当表的分布方式选择了Hash分布策略时,分布列选取至关重要。在这一步中,建议按照选择分布列选择分布键:
选择各表的主键作为Hash表分布键。
表名 | 记录数 | 分布方式 | 分布键 |
|---|---|---|---|
Store_Sales | 287997024 | Hash | ss_item_sk |
Date_Dim | 73049 | Replication | - |
Store | 402 | Replication | - |
Item | 204000 | Replication | - |
Time_Dim | 86400 | Replication | - |
Promotion | 1000 | Replication | - |
Customer_Demographics | 1920800 | Hash | cd_demo_sk |
Customer_Address | 1000000 | Hash | ca_address_sk |
Household_Demographics | 7200 | Replication | - |
Customer | 1981703 | Hash | c_customer_sk |
Income_Band | 20 | Replication | - |

