Step 6: Evaluating the Performance of the Optimized Table
Compare the loading time, storage space usage, and query execution time before and after the table tuning.
The following table shows the example results of the cluster used in this tutorial. Your results will be different, but should show similar improvement.
Benchmark |
Before |
After |
Change |
Percentage (%) |
---|---|---|---|---|
Loading time (11 tables) |
341584 ms |
257241 ms |
-84343 ms |
-24.7% |
Occupied storage space |
- |
- |
||
Store_Sales |
42 GB |
14 GB |
-28 GB |
-66.7% |
Date_Dim |
11 MB |
27 MB |
16 MB |
145.5% |
Store |
232 KB |
4352 KB |
4120 KB |
1775.9% |
Item |
110 MB |
259 MB |
149 MB |
1354.5% |
Time_Dim |
11 MB |
14 MB |
13 MB |
118.2% |
Promotion |
256 KB |
3200 KB |
2944 KB |
1150% |
Customer_Demographics |
171 MB |
11 MB |
-160 MB |
-93.6 |
Customer_Address |
170 MB |
27 MB |
-143 MB |
-84.1% |
Household_Demographics |
504 KB |
1280 KB |
704 KB |
139.7% |
Customer |
441 MB |
111 MB |
-330 MB |
-74.8% |
Income_Band |
88 KB |
896 KB |
808 KB |
918.2% |
Total storage space |
42 GB |
15 GB |
-27 GB |
-64.3% |
Query execution time |
- |
- |
||
Query 1 |
14552.05 ms |
1783.353 ms |
-12768.697 ms |
-87.7% |
Query 2 |
27952.36 ms |
14247.803 ms |
-13704.557 ms |
-49.0% |
Query 3 |
17721.15 ms |
11441.659 ms |
-6279.491 ms |
-35.4% |
Total execution time |
60225.56 ms |
27472.815 ms |
-32752.745 ms |
-54.4% |
Evaluating the Table After Optimization
- The loading time was reduced by 24.7%.
The distribution mode has obvious impact on loading data. The hash distribution mode improves the loading efficiency. The replication distribution mode reduces the loading efficiency. When the CPU and I/O are sufficient, the compression level has little impact on the loading efficiency. Typically, the efficiency of loading a column-store table is higher than that of a row-store table.
- The storage usage space was reduced by 64.3%.
The compression level, column storage, and hash distribution can save the storage space. A replication table increases the storage usage, but reduces the network overhead. Using the replication mode for small tables is a positive way to use small space for performance.
- The query performance (speed) increased by 54.4%, indicating that the query time decreased by 54.4%.
The query performance is improved by optimizing storage modes, distribution modes, and distribution keys. In a statistical analysis query on multi-column tables, column storage can improve query performance. In a hash table, I/O resources on each node can be used during I/O read/write, which improves the read/write speed of a table.
Often, query performance can be improved further by rewriting queries and configuring workload management (WLM). For more information, see Overview of Query Performance Optimization.
You can adapt the operations in Table Optimization Practices to further improve the distribution of tables and the performance of data loading, storage, and query.
Deleting Resources
After the exercise is completed, delete the cluster by referring to Deleting a Cluster.
If you want to keep the cluster, but delete the storage space used by the SS tables, run the following commands:
1 2 3 4 5 6 7 8 9 10 11 |
DROP TABLE store_sales; DROP TABLE date_dim; DROP TABLE store; DROP TABLE item; DROP TABLE time_dim; DROP TABLE promotion; DROP TABLE customer_demographics; DROP TABLE customer_address; DROP TABLE household_demographics; DROP TABLE customer; DROP TABLE income_band; |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.