Updated on 2024-10-14 GMT+08:00

Database Usage Suggestions

Database Naming

  • The names of database objects like databases, tables, and columns should be in lowercase. Different words in the name are separated with underscores (_).
  • Reserved words and keywords cannot be used to name database objects in RDS for MySQL.
  • In addition to those of MySQL 8.0 Community Edition, some other keywords and reserved words are added to RDS for MySQL. Do not use such words to name objects.
    Table 1 New keywords and reserved words of RDS for MySQL

    Reserved word

    Scenario

    RECYCLE_BIN

    Recycle bin

  • Each database object name must be explainable and contain a maximum of 32 characters.
  • Each temporary table in databases is prefixed with tmp and suffixed with a date.
  • Each backup table in databases is prefixed with bak and suffixed with a date.
  • All columns storing the same data in different databases or tables must have the same name and be of the same type.

Database Design

  • All tables use the InnoDB storage engine unless otherwise specified. InnoDB supports transactions and row locks. It delivers excellent performance, making it easy to recover data.
  • Databases and tables all use the UTF8 character set to avoid characters getting garbled by character set conversion.
  • All tables and fields require comments that can be added using the COMMENT clause to maintain the data dictionary from the beginning of the design.
  • The length of a single row in the table cannot exceed 1024 bytes.
  • To avoid cross-partition queries, RDS for MySQL partitioned tables are not recommended. Cross-partition queries will decrease the query efficiency. A partitioned table is logically a single table, but the data is actually stored in multiple different files.
  • Do not create too many columns in one table. Store cold and warm data separately to reduce the width of a table. In doing so, more rows of data can be stored in each memory page, decreasing disk I/O and making more efficient use of the cache.
  • Columns that are frequently used together should be in the same table to avoid JOIN operations.
  • Do not create reserved fields in a table. Otherwise, modifying the column type will lock the table, which has a greater impact than adding a field.
  • Do not store binary data such as images and files in databases.
  • Full-text indexes are not recommended because there are many limitations on full-text indexes for MySQL Community Edition.
  • Create no more than 20,000 tables in an instance.
  • Do not maintain your client connection to an instance for more than 8 hours.
  • To prevent out of memory (OOM) exceptions from occurring when your instance handles a large number of concurrent requests, set tmp_table_size, innodb_buffer_pool_size, max_connections, sort_buffer_size, read_buffer_size, read_rnd_buffer_size, join_buffer_size, thread_stack, and binlog_cache_size to the values not exceeding their default values.

Field Design

  • Ensure that each table contains no more than 50 fields.
  • Select a small data type for each column as much as possible. Numeric data is preferred, followed by dates or binary data, and the least preferred is characters. The larger the column data type, the more the space required for creating indexes. As a result, there are fewer indexes on a page and more I/O operations required, so database performance deteriorates.
  • If the integer type is used as the database field type, select the shortest column type. If the value is a non-negative number, it must be the unsigned type.
  • Each field should have the NOT NULL attribute. The default value for the numeric type such as INT is recommended to be 0, and that for the character type such as VARCHAR is recommended to be an empty string.
  • Do not use the ENUM type. Instead, use the TINYINT type.

    Change ENUM values using ALTER. The ORDER BY operations on ENUM values are inefficient and require extra operations.

    If you have specified that ENUM values cannot be numeric, other data types (such as char) can be used.

  • If the numeric data type is required, use DECIMAL instead of FLOAT or DOUBLE.

    FLOAT and DOUBLE data cannot be stored precisely, and value comparison results may be incorrect.

  • When you want to record a date or specific time, use the DATETIME or TIMESTAMP type instead of the string type.
  • Store IP addresses using the INT UNSIGNED type. You can convert IP addresses into numeric data using function inet_aton or inet_ntoa.
  • The VARCHAR data should be as short as possible. Although the VARCHAR data varies in length dynamically on disks, it occupies the maximum length in memory.
  • Use VARBINARY to store variable-length character strings that are case-sensitive. VARBINARY is case-sensitive by default and quick to process because no character sets are involved.

Index Design

  • Create a primary key for each InnoDB table. Neither use a frequently-updated column as the primary key nor a multi-column primary key. Do not use the UUID, MD5, or character string column as the primary key. Use a column whose values can increment continuously as the primary key. So, the auto-increment ID column is recommended.
  • Use no more than 5 indexes in a single table. Indexes speed up queries, but too many indexes may slow down writes. Inappropriate indexes sometimes reduce query efficiency.
  • Do not create an independent index for each column in a table. A well-designed composite index is much more efficient than a separate index on each column.
  • Create an index on the following columns:
    • Columns specified in the WHERE clause of SELECT, UPDATE, or DELETE statements
    • Columns specified in ORDER BY, GROUP BY, or DISTINCT
    • Columns associated for joining multiple tables.
  • The index column order is as follows:
    • Put the column with the highest selectivity on the far left when creating a composite index. Selectivity = Different values in a column/Total rows in the column
    • Put the column with the smallest field length on the far left of the composite index. The smaller length a field has, the more data one page stores, and the better the I/O performance is.
    • Put the most frequently used column on the left of the composite index, so you can create fewer indexes.
  • Avoid using redundant indexes, such as primary key (id), index (id), and unique index (id).
  • Avoid using duplicate indexes, such as index(a,b,c), index(a,b), and index(a). Duplicate and redundant indexes may slow down queries because the RDS for MySQL query optimizer does not know which index it should use.
  • When creating an index on the VARCHAR field, specify the index length based on selectivity. Do not index the entire field.

    If an index with the length of 20 bytes is the string type, its selectivity can reach 90% or above. In this case, use count(distinct left(column name, index length))/count(*) to check index selectivity.

  • Use covering indexes for frequent queries.

    A covering index is a special type of index where all required fields for a query are included in the index. The index itself contains columns specified in WHERE and GROUP BY clauses, but also column combinations queried in SELECT, without having to execute additional queries.

  • Constraints on foreign keys are as follows:

    The character sets of the columns for which a foreign key relationship is established must be the same, or the character sets of the parent and child tables for which a foreign key relationship is established must be the same.

SQL Statement Development

  • Use prepared statements to perform database operations in programs. Prepared statements can be executed multiple times in a program once they are written, more efficient than SQL statements.
  • Avoid implicit conversions because they may cause index to become invalid.

    Do not perform function conversions or math calculations on columns in the WHERE clause. Otherwise, the index becomes invalid.

  • Do not use double percent signs (%%) or place % before a query condition, or the index cannot be used.
  • Do not use select * for queries because using select *:
    • Consumes more CPUs, IP addresses, and bandwidth.
    • Causes covering indexes to become unavailable.
    • Increases the impact of table structure changes on code.
  • Do not use subqueries. Subqueries generate temporary tables that do not have any indexes. If there is a lot of data, the query efficiency is severely affected. Convert subqueries into associated queries.
  • Minimize the use of JOIN operations for more than 5 tables. Use the same data type for the fields that require JOIN operations.

    Each JOIN operation on a table occupies extra memory (controlled by join_buffer_size) and requires temporary table operations, affecting query efficiency. Do not use NATURAL JOIN.

  • Reduce interactions with the same database as much as possible. The database is more suitable for processing batch operations.
  • Replace OR clauses with IN clauses because IN clauses can effectively use indexes. Specify no more than 500 values for an IN clause.
  • Do not perform reverse queries, for example, NOT IN and NOT LIKE.
  • Do not use ORDER BY RAND() for random sorting.

    This operation loads all data that meets the conditions from the table to the memory for sorting, consuming more CPUs, I/O, and memory resources.

    Obtain a random value from the program and retrieve data from the involved database based on the value.

  • If deduplication is not required, use UNION ALL instead of UNION.

    UNION ALL does not sort out result sets.

  • Combine multiple operations and perform them in batches. The database is good for batch processing.

    This reduces interactions with the same database.

  • If there are more than 1 million rows of write operations, perform them in multiple batches.

    A large number of batch writes may result in excessive primary/standby latency.

  • If ORDER BY is used, use the order of indexes.
    • The last field of ORDER BY is a part of a composite index and is placed at the end of the composite index order.
    • Avoid file_sort to speed up queries.

    Correct example: in where a=? and b=? order by c;, index: a_b_c

    Wrong example: If an index supports range search, the index order cannot be used. For example, WHERE a>10 ORDER BY b;, index: a_b (sorting is not allowed)

  • Use ANSI-standard SQL statements instead of MySQL extended SQL statements for DML operations. Common MySQL extended SQL statements include:
    • REPLACE INTO
    • INSERT ... ON DUPLICATE KEY UPDATE
  • Stored procedures are not recommended because they are difficult to debug, extend, and transplant.
  • To avoid logical dependency on the database, do not use triggers, event schedulers, or views for service logic.
  • Large transactions are not recommended. If possible, a transaction should contain no more than five SQL statements because large transactions have problems such as long data lock time, too many caches, and connection consumption.
  • TRUNCATE TABLE is faster than DELETE and uses fewer system and log resources. If the table to be deleted does not have a trigger and the entire table needs to be deleted, TRUNCATE TABLE is recommended.
  • Do not run the flush logs command frequently to prevent automatic binlog deletion failures.
  • Keep the time that a transaction can run no more than 180 seconds. There should be no more than 10 concurrent transactions whose duration is longer than 30 seconds.
  • Do not modify more than 1 million rows in a single transaction.
  • Do not run large SQL statements that are generated by the system. For example, if you run an SELECT statement of 9 MB, the memory consumption increases by about 37 MB during the execution, which is about 4 times the size of the statement.

Session-level Parameters That Are Recommended Not to Be Modified

  • foreign_key_checks

    Description: The bool type. The default global value is ON. If this parameter is set to ON when a foreign key is created, the system checks the standardization of the foreign key, for example, the foreign key cannot reference other keys in the same table.

    Reasons for Not Changing Its Value:

    • If foreign_key_checks is set to ON, the foreign keys that cannot be checked using this parameter are not used in a standard manner. You are advised to optimize the SQL statements.
    • If this parameter is set to OFF at the thread level, and some non-standard foreign keys are created on the primary instance, the DDL statements used for creating these foreign keys will fail to be executed during replication to the standby instance because the value of foreign_key_checks is still ON on the standby instance, thus causing replication exceptions. For details, see Instance Reboot Failure or ERROR 1146: Table 'xxx' doesn't exist Reported During Table Operations.
    • You are advised to set foreign_key_checks to ON also for single-node instances.
  • innodb_strict_mode

    Description: The bool type. The default global value is ON. If this parameter is set to ON, an error instead of a warning is reported for some non-standard InnoDB table operations. For example, an error will be reported when the InnoDB page size is 16 KB but the size of a single row exceeds 8 KB.

    Reasons for Not Changing Its Value:

    • If this parameter is set to OFF at the thread level on the primary instance, the DDL statement ALTER TABLE can be executed successfully on the primary instance when there are many columns and the length of a single row exceeds 8 KB. However, because this parameter is still set to ON on the standby instance, an error will be reported when this statement is executed on the standby instance, causing a replication exception.
    • If you do need to change the value of this parameter, change it globally, i.e., both on the primary and standby instances.
  • default_storage_engine

    Description: The enumerated type. The value can only be an available storage engine name. The default global value is InnoDB. The default storage engine will be used if no storage engine is explicitly specified in the DDL statement CREATE TABLE or ALTER TABLE.

    Reasons for Not Changing Its Value:

    • RDS for MySQL primary/standby replication uses binlog-based replication of the open source community. If no storage engine is explicitly specified in the DDL statement CREATE TABLE or ALTER TABLE, the default storage engine will not be recorded in binlogs (consistent with the community). But when this statement is executed on the standby instance during primary/standby replication, the default value (InnoDB) of default_storage_engine of the replication thread is used.
    • If this parameter is changed to a storage engine other than InnoDB at the session level on the primary instance, running CREATE TABLE or ALTER TABLE will cause inconsistent storage engines used by tables created on the primary instance and standby instance.
  • unique_checks

    Description: The bool type. The default global value is ON, which indicates that a uniqueness check will be performed on the unique keys of secondary indexes in InnoDB tables.

    Reasons for Not Changing Its Value:

    If this parameter is set to OFF at the session level on the primary instance, the uniqueness of the unique keys of secondary indexes will not be checked, and a DML statement with duplicate unique keys of secondary indexes can be executed successfully. However, on the standby instance, this DML statement will fail to be executed due to the uniqueness check, causing a replication exception.

  • sql_log_bin

    Description: This parameter controls whether SQL statements of the current session are recorded in binlogs. The default value is ON.

    Reasons for Not Changing Its Value:

    RDS for MySQL primary/standby replication uses binlog-based replication of the open-source community. If this parameter is set to OFF for a single session on the primary instance, modifications to tables are not recorded in binlogs. The modifications cannot be synchronized to the standby instance, causing data inconsistency between the primary and standby instances.

  • old_alter_table

    Description: The bool type. The default value is OFF. If the value is set to ON, the algorithm implemented by copying temporary tables used in the ALTER TABLE statement affects database performance.

    Reasons for Not Changing Its Value:

    • The ALTER IGNORE option may fail to be replayed on the standby instance.
    • The performance of ALTER TABLE based on COPY is poor.