Help Center/ GaussDB(for MySQL)/ Troubleshooting/ SQL Issues/ DISTINCT and GROUP BY Optimization
Updated on 2023-10-19 GMT+08:00

DISTINCT and GROUP BY Optimization

Scenario

The execution of the DISTINCT or GROUP BY statement is slow.

Possible Causes

In most cases, DISTINCT can be converted into an equivalent GROUP BY statement. DISTINCT is mainly used to remove duplicate records from database tables and fetch only the unique records.

The DISTINCT statement groups data first, and then fetches a piece of data from each group and returns the data to the client. There are two scenarios for grouping data:

  • All DISTINCT fields are included in the same index. In this scenario, GaussDB(for MySQL) directly uses the index to group data, obtains a piece of data from each group, and returns the data.
  • Not all DISTINCT fields are included in the index. In this scenario, qualified data is written to a temporary table and grouped in the temporary table. Using temporary tables causes extra overhead, deteriorating the performance.

In conclusion, when using DISTINCT or GROUP BY, set an index that contains all dependent fields. The following is an optimization example:

  • No proper index is available. As a result, temporary tables are used.

  • A proper index is available, and temporary tables are not required.

Solution

When using DISTINCT or GROUP BY, create an index that contains all dependent fields.