Case: Selecting an Appropriate Distribution Key

Symptom

Tables are defined as follows:

    
         CREATE TABLE t1 (a int, b int);
CREATE TABLE t2 (a int, b int);

The following query is executed:

    
         SELECT * FROM t1, t2 WHERE t1.a = t2.b;

Optimization Analysis

If a is the distribution key of t1 and t2:

    
         CREATE TABLE t1 (a int, b int) DISTRIBUTE BY HASH (a);
CREATE TABLE t2 (a int, b int) DISTRIBUTE BY HASH (a);

Then Streaming exists in the execution plan and the data volume is heavy among DNs, as shown in Figure 1.

Figure 1 Selecting an appropriate distribution key (1)
Click to enlarge

If a is the distribution key of t1 and b is the distribution key of t2:

    
         CREATE TABLE t1 (a int, b int) DISTRIBUTE BY HASH (a);
CREATE TABLE t2 (a int, b int) DISTRIBUTE BY HASH (b);

Then Streaming does not exist in the execution plan, and the data volume among DNs is decreasing and the query performance is increasing, as shown in Figure 2.

Figure 2 Selecting an appropriate distribution key (2)
Click to enlarge

Parent topic: Optimization Cases

Previous topic: Optimization Cases

Next topic: Case: Creating an Appropriate Index

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot