Help Center> Data Warehouse Service> Developer Guide> Query Performance Optimization> Optimization Cases> Case: Selecting an Appropriate Distribution Column

Case: Selecting an Appropriate Distribution Column

Symptom

Tables are defined as follows:

1 2	CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (a int, b int);

The following query is executed:

1	SELECT * FROM t1, t2 WHERE t1.a = t2.b;

Optimization Analysis

If a is the distribution column of t1 and t2:

1 2	CREATE TABLE t1 (a int, b int) DISTRIBUTE BY HASH (a); CREATE TABLE t2 (a int, b int) DISTRIBUTE BY HASH (a);

Then Streaming exists in the execution plan and the data volume is heavy among DNs, as shown in Figure 1.

Figure 1 Selecting an appropriate distribution column (1)
Click to enlarge

If a is the distribution column of t1 and b is the distribution column of t2:

1 2	CREATE TABLE t1 (a int, b int) DISTRIBUTE BY HASH (a); CREATE TABLE t2 (a int, b int) DISTRIBUTE BY HASH (b);

Then Streaming does not exist in the execution plan, and the data volume among DNs is decreasing and the query performance is increasing, as shown in Figure 2.

Figure 2 Selecting an appropriate distribution column (2)
Click to enlarge

Parent topic: Optimization Cases

Last Article: Optimization Cases

Next Article: Case: Creating an Appropriate Index

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Case: Selecting an Appropriate Distribution Column

Symptom

Optimization Analysis