更新时间:2024-10-10 GMT+08:00

步骤5:测试新的表结构下的系统性能

重新创建了具有存储方式、压缩级别、分布方式和分布列的测试数据集后,重新测试系统性能。

  1. 记录各表的存储使用情况。

    使用pg_size_pretty函数查询每张表使用的磁盘空间,并将结果记录到基准表中。

    1
    SELECT T_NAME, PG_SIZE_PRETTY(PG_RELATION_SIZE(t_name)) FROM (VALUES('store_sales'),('date_dim'),('store'),('item'),('time_dim'),('promotion'),('customer_demographics'),('customer_address'),('household_demographics'),('customer'),('income_band')) AS names1(t_name);
    
             t_name         | pg_size_pretty
    ------------------------+----------------
     store_sales            | 14 GB
     date_dim               | 27 MB
     store                  | 4352 kB
     item                   | 259 MB
     time_dim               | 14 MB
     promotion              | 3200 kB
     customer_demographics  | 11 MB
     customer_address       | 27 MB
     household_demographics | 1280 kB
     customer               | 111 MB
     income_band            | 896 kB
    (11 rows)

  2. 测试查询性能,并将性能数据录入基准表中。

    再次运行如下三个查询,并记录每个查询的耗费时间。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    \timing on
    SELECT * FROM (SELECT  COUNT(*)  
    FROM store_sales 
        ,household_demographics  
        ,time_dim, store 
    WHERE ss_sold_time_sk = time_dim.t_time_sk    
        AND ss_hdemo_sk = household_demographics.hd_demo_sk  
        AND ss_store_sk = s_store_sk 
        AND time_dim.t_hour = 8 
        AND time_dim.t_minute >= 30 
        AND household_demographics.hd_dep_count = 5 
        AND store.s_store_name = 'ese' 
    ORDER BY COUNT(*) 
     ) LIMIT 100;
    
    SELECT * FROM (SELECT  i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact,
    SUM(ss_ext_sales_price) ext_price
     FROM date_dim, store_sales, item,customer,customer_address,store
     WHERE d_date_sk = ss_sold_date_sk
       AND ss_item_sk = i_item_sk
       AND i_manager_id=8
       AND d_moy=11
       AND d_year=1999
       AND ss_customer_sk = c_customer_sk 
       AND c_current_addr_sk = ca_address_sk
       AND substr(ca_zip,1,5) <> substr(s_zip,1,5) 
       AND ss_store_sk = s_store_sk 
     GROUP BY i_brand
          ,i_brand_id
          ,i_manufact_id
          ,i_manufact
     ORDER BY ext_price desc
             ,i_brand
             ,i_brand_id
             ,i_manufact_id
             ,i_manufact
     ) LIMIT 100;
    
    SELECT * FROM (SELECT  s_store_name, s_store_id,
            SUM(CASE WHEN (d_day_name='Sunday') THEN ss_sales_price ELSE null END) sun_sales,
            SUM(CASE WHEN (d_day_name='Monday') THEN ss_sales_price ELSE null END) mon_sales,
            SUM(CASE WHEN (d_day_name='Tuesday') THEN ss_sales_price ELSE  null END) tue_sales,
            SUM(CASE WHEN (d_day_name='Wednesday') THEN ss_sales_price ELSE null END) wed_sales,
            SUM(CASE WHEN (d_day_name='Thursday') THEN ss_sales_price ELSE null END) thu_sales,
            SUM(CASE WHEN (d_day_name='Friday') THEN ss_sales_price ELSE null END) fri_sales,
            SUM(CASE WHEN (d_day_name='Saturday') THEN ss_sales_price ELSE null END) sat_sales
     FROM date_dim, store_sales, store
     WHERE d_date_sk = ss_sold_date_sk AND
           s_store_sk = ss_store_sk AND
           s_gmt_offset = -5 AND
           d_year = 2000 
     GROUP BY s_store_name, s_store_id
     ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales
      ) LIMIT 100;
    

    下面的基准表显示了本次实践中所用集群的验证结果。您的结果可能会因多方面的原因而有所变化,但规律性应该相差不大。考虑到操作系统缓存的影响,相同表结构的同一查询在每次执行时耗时会有不同属正常现象,建议多测试几次,取一组平均值。

    基准

    优化前

    优化后

    加载时间(11张表)

    341584ms

    257241ms

    占用存储

    Store_Sales

    42GB

    14GB

    Date_Dim

    11MB

    27MB

    Store

    232kB

    4352kB

    Item

    110MB

    259MB

    Time_Dim

    11MB

    14MB

    Promotion

    256kB

    3200kB

    Customer_Demographics

    171MB

    11MB

    Customer_Address

    170MB

    27MB

    Household_Demographics

    504kB

    1280kB

    Customer

    441MB

    111MB

    Income_Band

    88kB

    896kB

    总存储空间

    42GB

    15GB

    查询执行时间

    查询1

    14552.05ms

    1783.353ms

    查询2

    27952.36ms

    14247.803ms

    查询3

    17721.15ms

    11441.659ms

    总执行时间

    60225.56ms

    27472.815ms

  3. 如果对表设计后的性能还有更高期望,可以运行EXPLAIN PERFORMANCE以查看执行计划进行调优。

    关于执行计划的更详细介绍及查询优化请参考SQL执行计划优化查询性能概述