Step 4: Selecting Distribution Modes
GaussDB(DWS) uses a massively parallel processing (MPP) system of the shared-nothing architecture. The MPP performs horizontal partitioning to store tuples in service data tables on all DNs using proper distribution policies. Currently, user tables can be distributed in Replication or Hash mode.
- Replication: stores a full table on each DN. Full data in a table stored on each DN avoids data redistribution during the join operation. This reduces network costs and plan segment (each having a thread), but generates much redundant data. Generally, replication is only used for small dimension tables.
- Hash: A distribute key must be specified for a user table. If a record is inserted, the system performs hash computing based on values in the distribute column and then stores data on the related DN. In a hash table, I/O resources on each node can be used during I/O read/write, which greatly improves the read/write speed of a table. Generally, a large table (containing over 1 million records) is defined as a hash table.
Based on table sizes provided in Step 2: Testing System Performance of the Initial Table and Establishing a Baseline, set the distribution mode as follows.
| Table Name | Number of Rows | Distribution Mode |
|---|---|---|
| Store_Sales | 287997024 | Hash |
| Date_Dim | 73049 | Replication |
| Store | 402 | Replication |
| Item | 204000 | Replication |
| Time_Dim | 86400 | Replication |
| Promotion | 1000 | Replication |
| Customer_Demographics | 1920800 | Hash |
| Customer_Address | 1000000 | Hash |
| Household_Demographics | 7200 | Replication |
| Customer | 1981703 | Hash |
| Income_Band | 20 | Replication |
Last Article: Step 3: Selecting Storage and Compression Modes
Next Article: Step 5: Selecting a Distribution Key
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.