Updated on 2024-09-06 GMT+08:00

Index Design

  • Use the same field type to prevent implicit conversion from causing invalid indexes.
  • Create unique indexes on all minimum sets of fields or combinations of fields with uniqueness.
    For example, there is a table containing the fields a, b, c, d, e, and f. If the combinations of fields ab and ef have uniqueness, you are advised to create unique indexes for ab and ef, respectively.

    Even if complete verification control is implemented at the application layer, dirty data is generated as long as there is no unique index according to Murphy's Law.

    Before creating a unique index, consider whether it is helpful for queries. Useless indexes can be deleted.

    Evaluate the impact of extra indexes on INSERT operations. Determine whether to create unique indexes based on the requirements for the correctness and performance of data with uniqueness.

  • Create indexes on fixed-length fields (for example, INT). When creating an index on a VARCHAR field, the index length must be specified. It is not necessary to create an index on the whole field. The index length is determined according to the actual text distinction.

    The index length and distinction are a pair of contradictions. Generally, for string type data, the distinction of an index with a length of 20 bytes will be higher than 90%. The distinction formula is COUNT(DISTINCT LEFT(Column_name, Index_length))/COUNT(*). Place the column names with a high distinction on the left.

  • If possible, do not use left fuzzy search (for example, SELECT * FROM users WHERE u_name LIKE ' %hk') or full fuzzy search on the page to avoid degradation from index scan to full table scan. Solve the problem at the application layer.

    An index file has the leftmost prefix matching feature of B-tree. If the value on the left is not determined, the index cannot be used.

  • Use a covering index to query data and avoid returning to the table. However, do not add too many fields to the covering index, or the write performance will be compromised.

    Types of indexes that can be created include primary key indexes, unique indexes, and normal indexes. A covering index indicates that if you execute EXPLAIN statements, "using index" will be displayed in the Extra column.

  • Optimize the SQL performance as follows: range (minimum requirement), ref (basic requirement), and consts (maximum requirement).
  • When creating a composite index, place the column with the highest distinction on the left.
  • Ensure that the number of indexes in a single table is at most 5, or does not exceed 20% of the number of table fields.
  • Avoid the following misunderstandings when creating indexes:
    • Indexes should be frequently used. An index needs to be created for a query.
    • Indexes should be as few as possible. Indexes consume space and slow down updates and insertions.
    • Unique indexes cannot be used. Unique features must be resolved at the application layer using the "query first and then insert" method.
  • Reduce the use of ORDER BY that cannot be used with indexes based on the actual service requirements. The statements such as ORDER BY, GROUP BY, and DISTINCT consume many CPU resources.
  • If a complex SQL statement is involved, use the existing index design and add EXPLAIN before the SQL statement. EXPLAIN can help you optimize the index by adding some query restrictions.
  • Execute new SELECT, UPDATE, or DELETE statements with EXPLAIN to check the index usage and ensure no Using filesort and Using temporary are displayed in the Extra column. If the number of scanned rows exceeds 1,000, exercise caution when executing these statements. Analyze slow query logs and delete unused slow query statements every day.
    EXPLAIN:
    • type: ALL, index, range, ref, eq_ref, const, system, NULL (The performance is sorted from poor to good from left to right.)
    • possible_keys: indicates the indexes from which MySQL can choose to find rows in this table. If there is an index on a field, the index is listed but may not be used by the query.
    • key: indicates the key (index) that MySQL actually decided to use. If key is NULL, MySQL found no index to use for executing the query more efficiently. To force MySQL to use or ignore an index listed in the possible_keys column, use FORCE INDEX, USE INDEX, or IGNORE INDEX in your query.
    • ref: shows which columns or constants are compared to the index named in the key column to select rows from the table.
    • rows: indicates the estimated number of rows to be read for required records based on table statistics and index selection.
    • Extra:
      • Using temporary: To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently.
      • Using filesort: MySQL must do an extra pass to find out how to retrieve rows in sorted order.
      • Using index: The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. If Using where is displayed at the same time, it indicates that desired information needs to be obtained by using the index tree and reading rows of the table.
      • Using where: In WHERE clause, Using where is displayed when the desire data is obtained without reading all the data in the table or the desire data cannot be obtained by only using indexes. Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where and the table join type is ALL or index.
  • If a function is used on a WHERE statement, the index becomes invalid.

    For example, in WHERE left(name, 5) = 'zhang', the left function invalidates the index on name.

    You can modify the condition on the service side and delete the function. When the returned result set is small, the service side filters the rows that meet the condition.

  • For ultra-large tables, you also need to comply with the following rules when using indexes:
    • Create indexes for columns involved in the WHERE and ORDER BY statements. You can use EXPLAIN to check whether indexes or full table scans are used.
    • Fields with sparse value distribution, such as gender with only two or three values, cannot be indexed.
    • Do not use string fields as primary keys.
    • Do not use foreign keys. The program enforces the constraint.
    • When using multi-column indexes, arrange them in the same order as the query conditions and remove unnecessary single-column indexes (if any).
    • Before removing an index, conduct a thorough analysis and back up the data.