Updated on 2024-02-22 GMT+08:00

Database Usage Suggestions

Database Naming

  • The names of database objects like databases, tables, and columns should be in lowercase. Different words in the name are separated with underscores (_).
  • Reserved words and keywords cannot be used to name database objects in RDS for MySQL.
  • Each database object name must be explainable and contain a maximum of 32 characters.
  • Each temporary table in databases is prefixed with tmp and suffixed with a date.
  • Each backup table in databases is prefixed with bak and suffixed with a date.
  • All columns storing the same data in different databases or tables must have the same name and be of the same type.

Database Design

  • All tables use the InnoDB storage engine unless otherwise specified. InnoDB supports transactions and row locks. It delivers excellent performance, making it easy to recover data.
  • Databases and tables all use the UTF8 character set to avoid characters getting garbled by character set conversion.
  • All tables and fields require comments that can be added using the COMMENT clause to maintain the data dictionary from the beginning of the design.
  • The length of a single row in the table cannot exceed 1024 bytes.
  • To avoid cross-partition queries, RDS for MySQL partitioned tables are not recommended. Cross-partition queries will decrease the query efficiency. A partitioned table is logically a single table, but the data is actually stored in multiple different files.
  • Do not create too many columns in one table. Store cold and warm data separately to reduce the width of a table. In doing so, more rows of data can be stored in each memory page, decreasing disk I/O and making more efficient use of the cache.
  • Columns that are frequently used together should be in the same table to avoid JOIN operations.
  • Do not create reserved fields in a table. Otherwise, modifying the column type will lock the table, which has a greater impact than adding a field.
  • Do not store binary data such as images and files in databases.
  • Full-text indexes are not recommended because there are many limitations on full-text indexes for MySQL Community Edition.

Field Design

  • Ensure that each table contains no more than 50 fields.
  • Select a small data type for each column as much as possible. Numeric data is preferred, followed by dates or binary data, and the least preferred is characters. The larger the column data type, the more the space required for creating indexes. As a result, there are fewer indexes on a page and more I/O operations required, so database performance deteriorates.
  • If the integer type is used as the database field type, select the shortest column type. If the value is a non-negative number, it must be the unsigned type.
  • Each field should have the NOT NULL attribute. The default value for the numeric type such as INT is recommended to be 0, and that for the character type such as VARCHAR is recommended to be an empty string.
  • Do not use the ENUM type. Instead, use the TINYINT type.

    Change ENUM values using ALTER. The ORDER BY operations on ENUM values are inefficient and require extra operations.

    If you have specified that ENUM values cannot be numeric, other data types (such as char) can be used.

  • If the numeric data type is required, use DECIMAL instead of FLOAT or DOUBLE.

    FLOAT and DOUBLE data cannot be stored precisely, and value comparison results may be incorrect.

  • When you want to record a date or specific time, use the DATETIME or TIMESTAMP type instead of the string type.
  • Store IP addresses using the INT UNSIGNED type. You can convert IP addresses into numeric data using function inet_aton or inet_ntoa.
  • The VARCHAR data should be as short as possible. Although the VARCHAR data varies in length dynamically on disks, it occupies the maximum length in memory.
  • Use VARBINARY to store variable-length character strings that are case-sensitive. VARBINARY is case-sensitive by default and quick to process because no character sets are involved.

Index Design

  • Create a primary key for each InnoDB table. Neither use a frequently-updated column as the primary key nor a multi-column primary key. Do not use the UUID, MD5, or character string column as the primary key. Use a column whose values can increment continuously as the primary key. So, the auto-increment ID column is recommended.
  • Use no more than 5 indexes in a single table. Indexes speed up queries, but too many indexes may slow down writes. Inappropriate indexes sometimes reduce query efficiency.
  • Do not create an independent index for each column in a table. A well-designed composite index is much more efficient than a separate index on each column.
  • Create an index on the following columns:
    • Columns specified in the WHERE clause of SELECT, UPDATE, or DELETE statements
    • Columns specified in ORDER BY, GROUP BY, or DISTINCT
    • Columns associated for joining multiple tables.
  • The index column order is as follows:
    • Put the column with the highest selectivity on the far left when creating a composite index. Selectivity = Different values in a column/Total rows in the column
    • Put the column with the smallest field length on the far left of the composite index. The smaller length a field has, the more data one page stores, and the better the I/O performance is.
    • Put the most frequently used column on the left of the composite index, so you can create fewer indexes.
  • Avoid using redundant indexes, such as primary key (id), index (id), and unique index (id).
  • Avoid using duplicate indexes, such as index(a,b,c), index(a,b), and index(a). Duplicate and redundant indexes may slow down queries because the RDS for MySQL query optimizer does not know which index it should use.
  • When creating an index on the VARCHAR field, specify the index length based on selectivity. Do not index the entire field.

    If an index with the length of 20 bytes is the string type, its selectivity can reach 90% or above. In this case, use count(distinct left(column name, index length))/count(*) to check index selectivity.

  • Use covering indexes for frequent queries.

    A covering index is a special type of index where all required fields for a query are included in the index. The index itself contains columns specified in WHERE and GROUP BY clauses, but also column combinations queried in SELECT, without having to execute additional queries.

  • Constraints on foreign keys are as follows:

    The character sets of the columns for which a foreign key relationship is established must be the same, or the character sets of the parent and child tables for which a foreign key relationship is established must be the same.

SQL Statement Development

  • Use prepared statements to perform database operations in programs. Prepared statements can be executed multiple times in a program once they are written, more efficient than SQL statements.
  • Avoid implicit conversions because they may cause index to become invalid.

    Do not perform function conversions or math calculations on columns in the WHERE clause. Otherwise, the index becomes invalid.

  • Do not use double percent signs (%%) or place % before a query condition, or the index cannot be used.
  • Do not use select * for queries because using select *:
    • Consumes more CPUs, IP addresses, and bandwidth.
    • Causes covering indexes to become unavailable.
    • Increases the impact of table structure changes on code.
  • Do not use subqueries. Subqueries generate temporary tables that do not have any indexes. If there is a lot of data, the query efficiency is severely affected. Convert subqueries into associated queries.
  • Minimize the use of JOIN operations for more than 5 tables. Use the same data type for the fields that require JOIN operations.

    Each JOIN operation on a table occupies extra memory (controlled by join_buffer_size) and requires temporary table operations, affecting query efficiency. Do not use NATURAL JOIN.

  • Reduce interactions with the same database as much as possible. The database is more suitable for processing batch operations.
  • Replace OR operations with IN operations. IN operations can effectively use indexes. The number of IN values cannot exceed 500.
  • Do not perform reverse queries, for example, NOT IN and NOT LIKE.
  • Do not use ORDER BY RAND() for random sorting.

    This operation loads all data that meets the conditions from the table to the memory for sorting, consuming more CPUs, I/O, and memory resources.

    Obtain a random value from the program and retrieve data from the involved database based on the value.

  • If deduplication is not required, use UNION ALL instead of UNION.

    UNION ALL does not sort out result sets.

  • Combine multiple operations and perform them in batches. The database is good for batch processing.

    This reduces interactions with the same database.

  • If there are more than 1 million rows of write operations, perform them in multiple batches.

    A large number of batch writes may result in excessive primary/standby latency.

  • If ORDER BY is used, use the order of indexes.
    • The last field of ORDER BY is a part of a composite index and is placed at the end of the composite index order.
    • Avoid file_sort to speed up queries.

    Correct example: in where a=? and b=? order by c;, index: a_b_c

    Wrong example: If an index supports range search, the index order cannot be used. For example, WHERE a>10 ORDER BY b;, index: a_b (sorting is not allowed)

  • Use ANSI-standard SQL statements instead of MySQL extended SQL statements for DML operations. Common MySQL extended SQL statements include:
    • REPLACE INTO
    • INSERT ... ON DUPLICATE KEY UPDATE
  • Stored procedures are not recommended because they are difficult to debug, extend, and transplant.
  • To avoid logical dependency on the database, do not use triggers, event schedulers, or views for service logic.
  • Large transactions are not recommended. If possible, a transaction should contain no more than five SQL statements because large transactions have problems such as long data lock time, too many caches, and connection consumption.
  • TRUNCATE TABLE is faster than DELETE and uses fewer system and log resources. If the table to be deleted does not have a trigger and the entire table needs to be deleted, TRUNCATE TABLE is recommended.
  • Do not run the flush logs command frequently to prevent automatic binlog deletion failures.