Updated on 2024-10-29 GMT+08:00

Step 5: Testing System Performance in the New Table

After recreating the test data set with the selected storage modes, compression levels, distribution modes, and distribution keys, you will retest the system performance.

  1. Record the storage space usage of each table.

    Determine how much disk space is used for each table using the pg_size_pretty function and record the results in base tables.

    1
    SELECT T_NAME, PG_SIZE_PRETTY(PG_RELATION_SIZE(t_name)) FROM (VALUES('store_sales'),('date_dim'),('store'),('item'),('time_dim'),('promotion'),('customer_demographics'),('customer_address'),('household_demographics'),('customer'),('income_band')) AS names1(t_name);
    
             t_name         | pg_size_pretty
    ------------------------+----------------
     store_sales            | 14 GB
     date_dim               | 27 MB
     store                  | 4352 kB
     item                   | 259 MB
     time_dim               | 14 MB
     promotion              | 3200 kB
     customer_demographics  | 11 MB
     customer_address       | 27 MB
     household_demographics | 1280 kB
     customer               | 111 MB
     income_band            | 896 kB
    (11 rows)

  2. Test the query performance and record the performance data in the benchmark table.

    Execute the following queries again and record the time spent on each query.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    \timing on
    SELECT * FROM (SELECT  COUNT(*)  
    FROM store_sales 
        ,household_demographics  
        ,time_dim, store 
    WHERE ss_sold_time_sk = time_dim.t_time_sk    
        AND ss_hdemo_sk = household_demographics.hd_demo_sk  
        AND ss_store_sk = s_store_sk 
        AND time_dim.t_hour = 8 
        AND time_dim.t_minute >= 30 
        AND household_demographics.hd_dep_count = 5 
        AND store.s_store_name = 'ese' 
    ORDER BY COUNT(*) 
     ) LIMIT 100;
    
    SELECT * FROM (SELECT  i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact,
    SUM(ss_ext_sales_price) ext_price
     FROM date_dim, store_sales, item,customer,customer_address,store
     WHERE d_date_sk = ss_sold_date_sk
       AND ss_item_sk = i_item_sk
       AND i_manager_id=8
       AND d_moy=11
       AND d_year=1999
       AND ss_customer_sk = c_customer_sk 
       AND c_current_addr_sk = ca_address_sk
       AND substr(ca_zip,1,5) <> substr(s_zip,1,5) 
       AND ss_store_sk = s_store_sk 
     GROUP BY i_brand
          ,i_brand_id
          ,i_manufact_id
          ,i_manufact
     ORDER BY ext_price desc
             ,i_brand
             ,i_brand_id
             ,i_manufact_id
             ,i_manufact
     ) LIMIT 100;
    
    SELECT * FROM (SELECT  s_store_name, s_store_id,
            SUM(CASE WHEN (d_day_name='Sunday') THEN ss_sales_price ELSE null END) sun_sales,
            SUM(CASE WHEN (d_day_name='Monday') THEN ss_sales_price ELSE null END) mon_sales,
            SUM(CASE WHEN (d_day_name='Tuesday') THEN ss_sales_price ELSE  null END) tue_sales,
            SUM(CASE WHEN (d_day_name='Wednesday') THEN ss_sales_price ELSE null END) wed_sales,
            SUM(CASE WHEN (d_day_name='Thursday') THEN ss_sales_price ELSE null END) thu_sales,
            SUM(CASE WHEN (d_day_name='Friday') THEN ss_sales_price ELSE null END) fri_sales,
            SUM(CASE WHEN (d_day_name='Saturday') THEN ss_sales_price ELSE null END) sat_sales
     FROM date_dim, store_sales, store
     WHERE d_date_sk = ss_sold_date_sk AND
           s_store_sk = ss_store_sk AND
           s_gmt_offset = -5 AND
           d_year = 2000 
     GROUP BY s_store_name, s_store_id
     ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales
      ) LIMIT 100;
    

    The following benchmark table shows the validation results of the cluster used in this tutorial. Your results may vary based on a number of factors, but the relative results should be similar. The execution durations of queries having the same table structure can be different, depending on the OS cache during execution. You are advised to perform several rounds of tests and select a group with average values.

    Benchmark

    Before

    After

    Loading time (11 tables)

    341584 ms

    257241 ms

    Occupied storage space

    Store_Sales

    42 GB

    14 GB

    Date_Dim

    11 MB

    27 MB

    Store

    232 KB

    4352 KB

    Item

    110 MB

    259 MB

    Time_Dim

    11 MB

    14 MB

    Promotion

    256 KB

    3200 KB

    Customer_Demographics

    171 MB

    11 MB

    Customer_Address

    170 MB

    27 MB

    Household_Demographics

    504 KB

    1280 KB

    Customer

    441 MB

    111 MB

    Income_Band

    88 KB

    896 KB

    Total storage space

    42 GB

    15 GB

    Query execution time

    Query 1

    14552.05 ms

    1783.353 ms

    Query 2

    27952.36 ms

    14247.803 ms

    Query 3

    17721.15 ms

    11441.659 ms

    Total execution time

    60225.56 ms

    27472.815 ms

  3. If you have higher expectations for the performance after the table design, you can run the EXPLAIN PERFORMANCE command to view the execution plan for tuning.

    For more details about execution plans and query tuning, see SQL Execution Plan and Query Performance Tuning Overview.