Updated on 2022-09-15 GMT+08:00

SQL Standards

INSERT

  • Syntax rules
    • Do not use INSERT to add data entries one by one. INSERT INTO VALUES (),()..() is recommended.
    • The MySQL JDBC driver ignores the executeBatch() statement by default, splits SQL statements that should be executed in batches into single statements, and sends them to the MySQL database one by one. This greatly deteriorates database performance. To execute SQL statements in batches, you have to set parameter rewriteBatchedStatements to true and ensure that the SQL JDBC driver is of version 5.1.13 or later. The driver executes SQL statements in batches only when rewriteBatchedStatements is set to true. This configuration is valid for INSERT, UPDATE, and DELETE operations.

      After you set rewriteBatchedStatements to true, set a proper value for batch size to control the number of INSERT, UPDATE, and DELETE operations. If the batch size value is too large, the performance may deteriorate. Unless otherwise specified, the value is not larger than 1000.

    • Do not set the sharding key value to a function, expression, or sub-query. A constant is recommended.
    • Do not set a common key value to a sub-query. A constant, function, or expression is recommended.
  • Batch data import

    LOAD DATA LOCAL INFILE is recommended for importing a large volume of data in batches.

    You only need to enable a session, and DDM will automatically imports data.

  • Data migration

    Use mysqldump to export SQL files and import them by running the MySQL source command.

  • Field auto_increment
    • DDM can create a unique global sequence of numbers using the AUTO-INCREMENT attribute.
    • If field auto-increment is used, do not assign its value in the VALUES clause. Otherwise, a primary key conflict may occur. If a value has been assigned in the VALUES clause, you can change it using ALTER SEQUENCE.
    • Do not set the auto-increment step to 1 to ensure stable performance. The default step is 1000.

UPDATE and DELETE

  • Common updates
    • Configure a sharding field in the WHERE condition when you perform an update or delete operation.
    • If the sharding field cannot be configured, reduce concurrency and control data entries involved in updates or deletion operations. You can use SELECT to search for the data that you want to update or delete, determine the data scope with DOUBLE CHECK, and finally perform an update or deletion.
  • Sharding field update
    • When you update a sharding field, ensure that there is a maximum of 10,000 data entries in the target table. If there are more than 10,000 data entries, create a table with the required sharding field or update the original sharding field by dividing one update operation into multiple small equivalent operations.
    • Update sharding fields during off-peak hours.
  • Association

    Do not perform an update or delete operation on multiple tables at the same time.

  • Subquery and LIMIT

    Do not use subqueries in an UPDATE or DELETE statement. Do not use LIMIT or ORDER BY LIMIT in UPDATE or DELETE statements.

SELECT

  • ORDER BY LIMIT function
    • When you use statement ORDER BY LIMIT, count or ORDER BY OFFSET, count, do not assign a large value for offset.
    • If error Temp table limit exceeded is returned, a temporary table containing intermediate data is generated and its data entries exceed the upper limit. Contact DDM technical support for SQL tuning.
  • GROUP BY function
    • Configure only field group by in statement SELECT_LIST.
    • Do not use clause ORDER BY in aggregate function group_concat that cannot be pushed.
    • Ensure that there are no more than three DISTINCT or GROUP BY fields.
    • Do not perform a GROUP BY operation after a JOIN or subquery operation.
    • Do not use COUNT(DISTINCT ) or SUM(DISTINCT ).
    • If error Temp table limit exceeded is returned, a temporary table containing intermediate data is generated during aggregation and its data entries exceed the upper limit. Contact DDM technical support for SQL tuning.
  • JOIN function
    • In a SELECT statement, specify a sharding field for each table or a broadcast table in the JOIN condition. Alternatively, use INNER/LEFT JOIN or RIGHT JOIN and ensure that the driving table is the smaller one.
    • Do not join two large tables directly.
    • Do not use OUTER JOIN in the JOIN ON condition.
    • If error Temp table limit exceeded is returned, a temporary table containing intermediate data is generated during the JOIN operation and its data entries exceed the upper limit. Contact DDM technical support for SQL tuning.
    • Do not join more than five tables directly.
    • Do not enable transactions when you perform a JOIN query.
    • Do not perform a JOIN query in a transaction. Enabling transactions affects DDM's selection of the most efficient JOIN algorithm.

      The size of tables depends on the volume of data selected using the WHERE condition.

  • Subqueries
    • Do not use a subquery or its JOIN condition in an OR expression.
    • Do not use scalar subqueries containing LIMIT, for example, SELECT (SELECT x FROM t2 WHERE t2.id= t.id LIMIT 1),a,b FROM t.
    • If subqueries and primary table are routed to the same shard, add /*+db=xxx*/ before your SQL statement to improve routing accuracy.
    • Do not use JOIN clauses in subqueries.
    • Do not use nested subqueries.
    • Do not perform an operation by comparing ROW expression with a subquery, for example, SELECT * FROM t WHERE (a,b,c)=(SELECT x,y,z FROM t2 WHERE …).
    • Do not use more than two subqueries in SELECT_LIST.

DDL

  • Execution of DDL statements

    Perform DDL operations on existing tables during off-peak hours.

  • Number of shards

    Before creating a sharded table, you can estimate the total volume of data to determine table shards. Do not configure more shards than required. The number of table shards is not as large as possible.

  • High-risk DDL statements

    Carefully check the SQL statement when you perform a high-risk DDL operation, for example, DROP TABLE and TRUNCATE TABLE.

  • Rectification of DDL failures

    If an error occurs during execution of a DDL statement, execute CHECK TABLE name to verify the structure of each table shard, locate the failed table shard, and rectify the fault. For example, if ALTER TABLE fails, add /*+allow_alter_rerun=true/ before the statement, enable the POWER operation, and execute the statement again till an output is returned, indicating structures of all tables are consistent.

  • DDL execution error caused by MDL locks
    • Background: Before a DDL statement is executed, DDM checks whether there are MDL locks held for tables in the associated RDS database to ensure DDL availability. If there are MDL locks, DDL reports an error and exits.
      metadata lock exists, one of MDL is [%s],DDL operation can not proceed, please use 'show metadata lock' to check current mdl, and use 'kill physical threadId@host:port' to clean it
    • Possible issues: If there are slow SQL statements that have been executed for several minutes, the possible cause may be MDL locks. In this case, DDL statements cannot be executed.
    • Solution 1: Set a larger value for ddl_precheck_mdl_threshold_time on the DDM console, for example, set it to 30 minutes (1800 seconds).

      ddl_precheck_mdl_threshold_time indicates the maximum duration in seconds for which a DDL statement holds an MDL lock. DDL reports an error only when the lock duration exceeds the threshold. The default value is 120 seconds.

    • Solution 2: Execute SHOW METADATA LOCK to check whether DDL execution is blocked by the MDL lock held for a slow transaction. If yes, execute kill physical threadId@host:port to disable the underlying slow transaction. Then use hint /*+allow_alter_rerun=true*/ and CHECK TABLE to complete execution of DDL statements.

      threadId indicates the thread ID of the underlying RDS instance node. host and port indicate the IP address and port of the RDS instance node.

  • Long execution time of DDL statements

    If DDL execution is suspended for a long time during off-peak hours, enable another session and execute XA RECOVER to check whether there are slow transactions. If yes, contact technical support.