Updated on 2024-10-29 GMT+08:00

Step 6: Evaluating the Performance of the Optimized Table

Compare the loading time, storage space usage, and query execution time before and after the table tuning.

The following table shows the example results of the cluster used in this tutorial. Your results will be different, but should show similar improvement.

Benchmark

Before

After

Change

Percentage (%)

Loading time (11 tables)

341584 ms

257241 ms

-84343 ms

-24.7%

Occupied storage space

-

-

Store_Sales

42 GB

14 GB

-28 GB

-66.7%

Date_Dim

11 MB

27 MB

16 MB

145.5%

Store

232 KB

4352 KB

4120 KB

1775.9%

Item

110 MB

259 MB

149 MB

1354.5%

Time_Dim

11 MB

14 MB

13 MB

118.2%

Promotion

256 KB

3200 KB

2944 KB

1150%

Customer_Demographics

171 MB

11 MB

-160 MB

-93.6

Customer_Address

170 MB

27 MB

-143 MB

-84.1%

Household_Demographics

504 KB

1280 KB

704 KB

139.7%

Customer

441 MB

111 MB

-330 MB

-74.8%

Income_Band

88 KB

896 KB

808 KB

918.2%

Total storage space

42 GB

15 GB

-27 GB

-64.3%

Query execution time

-

-

Query 1

14552.05 ms

1783.353 ms

-12768.697 ms

-87.7%

Query 2

27952.36 ms

14247.803 ms

-13704.557 ms

-49.0%

Query 3

17721.15 ms

11441.659 ms

-6279.491 ms

-35.4%

Total execution time

60225.56 ms

27472.815 ms

-32752.745 ms

-54.4%

Evaluating the Table After Optimization

  • The loading time was reduced by 24.7%.

    The distribution mode has obvious impact on loading data. The hash distribution mode improves the loading efficiency. The replication distribution mode reduces the loading efficiency. When the CPU and I/O are sufficient, the compression level has little impact on the loading efficiency. Typically, the efficiency of loading a column-store table is higher than that of a row-store table.

  • The storage usage space was reduced by 64.3%.

    The compression level, column storage, and hash distribution can save the storage space. A replication table increases the storage usage, but reduces the network overhead. Using the replication mode for small tables is a positive way to use small space for performance.

  • The query performance (speed) increased by 54.4%, indicating that the query time decreased by 54.4%.

    The query performance is improved by optimizing storage modes, distribution modes, and distribution keys. In a statistical analysis query on multi-column tables, column storage can improve query performance. In a hash table, I/O resources on each node can be used during I/O read/write, which improves the read/write speed of a table.

    Often, query performance can be improved further by rewriting queries and configuring workload management (WLM). For more information, see Overview of Query Performance Optimization.

You can adapt the operations in Optimizing Table Structure Design to Enhance GaussDB(DWS) Query Performance to further improve the distribution of tables and the performance of data loading, storage, and query.

Deleting Resources

After this practice is completed, delete the cluster.

To retain the cluster and delete the SS tables, run the following command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DROP TABLE store_sales;
DROP TABLE date_dim;
DROP TABLE store;
DROP TABLE item;
DROP TABLE time_dim;
DROP TABLE promotion;
DROP TABLE customer_demographics;
DROP TABLE customer_address;
DROP TABLE household_demographics;
DROP TABLE customer;
DROP TABLE income_band;