Updated on 2023-03-06 GMT+08:00

DISTINCT and GROUP BY Optimization

Scenario

The execution of the DISTINCT or GROUP BY statement is slow.

Possible Causes

In most cases, DISTINCT can be converted into an equivalent GROUP BY statement. In RDS for MySQL, DISTINCT is mainly used to remove duplicate records from database tables and fetch only the unique records.

The DISTINCT statement groups data first, and then fetches a piece of data from each group and returns the data to the client. There are two scenarios for grouping data:

  • All DISTINCT fields are included in the same index. In this scenario, RDS for MySQL directly uses the index to group data, obtains a piece of data from each group, and returns the data.
  • Not all DISTINCT fields are included in the index. In this scenario, qualified data is written to a temporary table and grouped in the temporary table. Using temporary tables causes extra overhead, deteriorating the performance.

In conclusion, when using DISTINCT or GROUP BY, set an index that contains all dependent fields. The following is an optimization example:

  • No suitable index is available, so temporary tables are used.

  • A suitable index is found, and temporary tables are not required.

Solution

When using DISTINCT or GROUP BY, create an index that contains all dependent fields.