步骤2:测试初始表结构下的系统性能并建立基线
在优化表结构前后,请测试和记录以下详细信息以对比系统性能差异:
- 数据加载时间。
- 表占用的存储空间大小。
- 查询性能。
本次实践中的示例基于使用8节点的dws.d2.xlarge集群。因为系统性能受到许多因素的影响,即使您使用相同的集群配置,结果也会有所不同。
| 机器型号 | dws.d2.xlarge VM |
|---|---|
| CPU | 4*CPU E5-2680 v2 @ 2.80GHZ |
| 内存 | 32GB |
| 网络 | 1GB |
| 磁盘 | 1.63TB |
| 节点数目 | 8 |
请使用下面的基准表来记录结果。
| 基准 | 优化前 | 优化后 |
|---|---|---|
| 加载时间(11张表) | 341584 ms | - |
| 占用存储 | ||
| Store_Sales | - | - |
| Date_Dim | - | - |
| Store | - | - |
| Item | - | - |
| Time_Dim | - | - |
| Promotion | - | - |
| Customer_Demographics | - | - |
| Customer_Address | - | - |
| Household_Demographics | - | - |
| Customer | - | - |
| Income_Band | - | - |
| 总存储空间 | - | - |
| 查询执行时间 | ||
| 查询1 | - | - |
| 查询2 | - | - |
| 查询3 | - | - |
| 总执行时间 | - | - |
执行以下步骤测试优化前的系统性能,以建立基准。
- 将上一节记下的所有11张表的累计加载时间填入基准表的“优化前”一列。
- 记录各表的存储使用情况。
使用pg_size_pretty函数查询每张表使用的磁盘空间,并将结果记录到基准表中。
1SELECT T_NAME, PG_SIZE_PRETTY(PG_RELATION_SIZE(t_name)) FROM (VALUES('store_sales'),('date_dim'),('store'),('item'),('time_dim'),('promotion'),('customer_demographics'),('customer_address'),('household_demographics'),('customer'),('income_band')) AS names1(t_name);
显示结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
t_name | pg_size_pretty ------------------------+---------------- store_sales | 42 GB date_dim | 11 MB store | 232 kB item | 110 MB time_dim | 11 MB promotion | 256 kB customer_demographics | 171 MB customer_address | 170 MB household_demographics | 504 kB customer | 441 MB income_band | 88 kB (11 rows)
- 测试查询性能。
运行如下三个查询,并记录每个查询的耗费时间。考虑到操作系统缓存的影响,同一查询在每次执行时耗时会有不同属正常现象,建议多测试几次,取一组平均值。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
\timing on SELECT * FROM (SELECT COUNT(*) FROM store_sales ,household_demographics ,time_dim, store WHERE ss_sold_time_sk = time_dim.t_time_sk AND ss_hdemo_sk = household_demographics.hd_demo_sk AND ss_store_sk = s_store_sk AND time_dim.t_hour = 8 AND time_dim.t_minute >= 30 AND household_demographics.hd_dep_count = 5 AND store.s_store_name = 'ese' ORDER BY COUNT(*) ) LIMIT 100; SELECT * FROM (SELECT i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, SUM(ss_ext_sales_price) ext_price FROM date_dim, store_sales, item,customer,customer_address,store WHERE d_date_sk = ss_sold_date_sk AND ss_item_sk = i_item_sk AND i_manager_id=8 AND d_moy=11 AND d_year=1999 AND ss_customer_sk = c_customer_sk AND c_current_addr_sk = ca_address_sk AND substr(ca_zip,1,5) <> substr(s_zip,1,5) AND ss_store_sk = s_store_sk GROUP BY i_brand ,i_brand_id ,i_manufact_id ,i_manufact ORDER BY ext_price desc ,i_brand ,i_brand_id ,i_manufact_id ,i_manufact ) LIMIT 100; SELECT * FROM (SELECT s_store_name, s_store_id, SUM(CASE WHEN (d_day_name='Sunday') THEN ss_sales_price ELSE null END) sun_sales, SUM(CASE WHEN (d_day_name='Monday') THEN ss_sales_price ELSE null END) mon_sales, SUM(CASE WHEN (d_day_name='Tuesday') THEN ss_sales_price ELSE null END) tue_sales, SUM(CASE WHEN (d_day_name='Wednesday') THEN ss_sales_price ELSE null END) wed_sales, SUM(CASE WHEN (d_day_name='Thursday') THEN ss_sales_price ELSE null END) thu_sales, SUM(CASE WHEN (d_day_name='Friday') THEN ss_sales_price ELSE null END) fri_sales, SUM(CASE WHEN (d_day_name='Saturday') THEN ss_sales_price ELSE null END) sat_sales FROM date_dim, store_sales, store WHERE d_date_sk = ss_sold_date_sk AND s_store_sk = ss_store_sk AND s_gmt_offset = -5 AND d_year = 2000 GROUP BY s_store_name, s_store_id ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales ) LIMIT 100;
经过上面的统计后,记录的基准表信息如下:
| 基准 | 优化前 | 优化后 |
|---|---|---|
| 加载时间(11张表) | 341584ms | - |
| 占用存储 | ||
| Store_Sales | 42GB | - |
| Date_Dim | 11MB | - |
| Store | 232kB | - |
| Item | 110MB | - |
| Time_Dim | 11MB | - |
| Promotion | 256kB | - |
| Customer_Demographics | 171MB | - |
| Customer_Address | 170MB | - |
| Household_Demographics | 504kB | - |
| Customer | 441MB | - |
| Income_Band | 88kB | - |
| 总存储空间 | 42GB | - |
| 查询执行时间 | ||
| 查询1 | 14552.05ms | - |
| 查询2 | 27952.36ms | - |
| 查询3 | 17721.15ms | - |
| 总执行时间 | 60225.56ms | - |