Help Center > > Developer Guide> Query Performance Optimization> Tuning Queries> Reviewing and Modifying a Table Definition

Reviewing and Modifying a Table Definition

Updated at: Dec 30, 2020 GMT+08:00

In a distributed framework, data is distributed on DNs. Data on one or more DNs is stored on a physical storage device. To properly define a table, you must:

  1. Evenly distribute data on each DN to avoid the available capacity decrease of a cluster caused by insufficient storage space of the storage device associated with a DN. Specifically, select a proper distribution key to avoid data skew.
  2. Evenly assign table scanning tasks on each DN to avoid that a DN is overloaded by the table scanning tasks. Specifically, do not select columns in the equivalent filter of a base table as the distribution key.
  3. Reduce the data volume scanned by using the partition pruning mechanism.
  4. Avoid the use of random I/O by using clustering or partial clustering.
  5. Avoid data shuffle to reduce the network pressure by selecting join condition columns or GROUP BY columns as the distribution key.

The distribution key is the core for defining a table. Figure 1 shows the procedure of defining a table. The table definition is created during the database design and is reviewed and modified during the SQL statement optimization.

Figure 1 Procedure of defining a table

For details about how to review and modify table definitions, see Learning the Tutorial: Tuning Table Design.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?

Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel