Help Center > > Developer Guide> Excellent Practices> Excellent Practices for Table Design> Selecting a Distribution Mode

Selecting a Distribution Mode

Updated at: Jul 15, 2020 GMT+08:00

Replication is to copy full data in a table to every DN in a cluster. This is suitable for tables having small record sets. Full data in a table stored on each DN avoids data redistribution during the join operation. This reduces network costs and plan segment (each having a thread), but generates much redundant data. Generally, replication is only used for small dimension tables.

In a hash table, hash values are generated for one or more columns. You can obtain the storage location of a tuple based on the mapping between DNs and the hash values. In a hash table, I/O resources on each node can be used during I/O read/write, which greatly improve the read/write speed of a table. Generally, a table containing a large amount data is defined as a hash table.

Policy

Description

Application Scenario

Hash

Table data is distributed on all DNs in the cluster in hash mode.

Fact tables containing a large amount of data

Replication

Full data in a table is stored on each DN in the cluster.

Small tables and dimension tables.

As shown in Figure 1, T1 is a replication table and T2 is a hash table.

Figure 1 Replication table and hash table

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel