DISTINCT and GROUP BY Optimization
Scenario
The execution of the DISTINCT or GROUP BY statement is slow.
Possible Causes
In most cases, DISTINCT can be converted into an equivalent GROUP BY statement. DISTINCT is mainly used to remove duplicate records from database tables and fetch only the unique records.
The DISTINCT statement groups data first, and then fetches a piece of data from each group and returns the data to the client. There are two scenarios for grouping data:
- All DISTINCT fields are included in the same index. In this scenario, GaussDB(for MySQL) directly uses the index to group data, obtains a piece of data from each group, and returns the data.
- Not all DISTINCT fields are included in the index. In this scenario, qualified data is written to a temporary table and grouped in the temporary table. Using temporary tables causes extra overhead, deteriorating the performance.
In conclusion, when using DISTINCT or GROUP BY, set an index that contains all dependent fields. The following is an optimization example:
Solution
When using DISTINCT or GROUP BY, create an index that contains all dependent fields.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.