步骤3:调优表操作具体步骤
选择存储方式
此实践中所使用的样例表为典型的TPC-DS表,是典型的多字段表,统计分析类查询场景多,因此选择列存存储方式。
| 1 | WITH (ORIENTATION = column) | 
选择压缩级别
在步骤1:创建初始表并加装样例数据中没有指定压缩比,DWS默认为用户选择LOW级别压缩比。在这一步中把压缩比调整为MIDDLE级别,进行验证对比。
增加存储方式和压缩比后的建表样例如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | CREATE TABLE store_sales ( ss_sold_date_sk integer , ss_sold_time_sk integer , ss_item_sk integer not null, ss_customer_sk integer , ss_cdemo_sk integer , ss_hdemo_sk integer , ss_addr_sk integer , ss_store_sk integer , ss_promo_sk integer , ss_ticket_number bigint not null, ss_quantity integer , ss_wholesale_cost decimal(7,2) , ss_list_price decimal(7,2) , ss_sales_price decimal(7,2) , ss_ext_discount_amt decimal(7,2) , ss_ext_sales_price decimal(7,2) , ss_ext_wholesale_cost decimal(7,2) , ss_ext_list_price decimal(7,2) , ss_ext_tax decimal(7,2) , ss_coupon_amt decimal(7,2) , ss_net_paid decimal(7,2) , ss_net_paid_inc_tax decimal(7,2) , ss_net_profit decimal(7,2) ) WITH (ORIENTATION = column,COMPRESSION=middle); | 
选择分布方式
依据步骤2:测试初始表结构下的系统性能并建立基线中所基线的各表大小,分布方式设置如下:
| 表名 | 行数 | 分布方式 | 
|---|---|---|
| Store_Sales | 287997024 | Hash | 
| Date_Dim | 73049 | Replication | 
| Store | 402 | Replication | 
| Item | 204000 | Replication | 
| Time_Dim | 86400 | Replication | 
| Promotion | 1000 | Replication | 
| Customer_Demographics | 1920800 | Hash | 
| Customer_Address | 1000000 | Hash | 
| Household_Demographics | 7200 | Replication | 
| Customer | 1981703 | Hash | 
| Income_Band | 20 | Replication | 
选择分布列
当表的分布方式选择了Hash分布策略时,分布列选取至关重要。在这一步中,建议按照选择分布列选择分布键:
选择各表的主键作为Hash表分布键。
| 表名 | 记录数 | 分布方式 | 分布键 | 
|---|---|---|---|
| Store_Sales | 287997024 | Hash | ss_item_sk | 
| Date_Dim | 73049 | Replication | - | 
| Store | 402 | Replication | - | 
| Item | 204000 | Replication | - | 
| Time_Dim | 86400 | Replication | - | 
| Promotion | 1000 | Replication | - | 
| Customer_Demographics | 1920800 | Hash | cd_demo_sk | 
| Customer_Address | 1000000 | Hash | ca_address_sk | 
| Household_Demographics | 7200 | Replication | - | 
| Customer | 1981703 | Hash | c_customer_sk | 
| Income_Band | 20 | Replication | - | 
 
  