Cross-Shard Complex SQL Tuning
This section describes how to optimize cross-shard complex SQL statements.
Suggestions on Tuning GROUP BY and ORDER BY
- If the GROUP BY field is not a data column in the table, use Generated Columns to create a column, add an index to it, and use the column as the GROUP BY condition to avoid using temporary tables for query. (Generated Columns is supported in MySQL 5.7 or later. In MySQL 5.6 and earlier versions, you can create common fields and add indexes.)
- Ensure that the GROUP BY and ORDER BY fields are the same. Data queries are faster when the GROUP BY field contains sharding keys.
- Use indexes in GROUP BY and ORDER BY.
Suggestions on Tuning JOINs
- Preferentially use sharding keys to perform EQUI-JOINs on conditional JOIN statements.
- Use the table with a smaller result set as the driving table to reduce loop JOINs. When LEFT JOINs or JOINs are used, the driving table is on the left and the driven table is on the right. When RIGHT JOINs are used, the driving table is on the right and the driven table is on the left.
- Add indexes to JOIN fields of the driven table. Do not join two large tables directly, especially unsharded tables without any constraint, which may use a large number of compute resources.
- Do not use JOINs in explicit transactions. To ensure transaction visibility, the optimal execution plan may not be used.
Suggestions on Tuning SQL UNION
- Do not perform UNION operations (excluding UNION ALL) between large tables, in case that the number of temporary tables exceeds the upper limit during deduplication.
- Do not use ORDER BY or GROUP BY in a UNION statement.
Suggestions on Tuning Temporary Tables
When complex SQL statements such as JOINs, UNION, GROUP BY, and ORDER BY are executed across shards, DDM may create a temporary table to store data. The maximum size of a temporary table is 1 million rows by default. If the number of rows in the temporary table exceeds this upper limit, an error "Temp table limit exceeded" will be displayed, indicating that the SQL statements need to be calculated again on the DDM instance, and the required temporary table size exceeds the limit allowed by the DDM instance. If such errors on temporary tables are reported, evaluate whether the SQL statements can be optimized by, for example, configuring broadcast tables, adding conditions for batch query, or modifying SQL statements based on sharding architecture. If there is no optimization potential, check data volume and instance resources. If there are no more than 3 million rows of data and instance resources are sufficient, set the instance parameter temp_table_size_limit to a large value on the console to remove the restriction on temporary table size.
Scenarios Where Temporary Tables Are Generated
- The GROUP BY and ORDER BY fields are different.
- The DISTINCT keyword is used.
- UNION queries are executed.
- Some cross-shard subqueries are executed.
- In a JOIN query, non-equivalent JOIN conditions are used, or a transaction is enabled on the connection between two subqueries.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot