Updated on 2024-01-12 GMT+08:00

SELECT

  • Do not use the wildcard character (*) in the SELECT statement.

    If the table structure is changed due to service or database upgrade when a wildcard character is used to query a table, the table structure may be incompatible with service statements.

    Therefore, the service must specify the name of the table column to be queried. Do not use the wildcard character.

  • SELECT LIMIT statements must contain ORDER BY for query.

    GaussDB is a distributed database. Table data is distributed on multiple DNs.

    If the SQL statement contains only LIMIT but not ORDER BY, the database sends the result (meeting the query requirements) sent by the DN with fast network transmission to the client as the final result.

    The network transmission efficiency may vary at different time points. As a result, the returned results are inconsistent when the SQL statement is executed for multiple times.

  • Do not perform operations that may cause sorting, such as ORDER BY, DISTINCT, GROUP BY, and UNION, on large columns (such as VARCHAR(2000)).

    These operations consume a large number of CPU and memory resources, resulting in low execution efficiency.

  • Do not use the LOCK TABLE statement to lock a table. Instead, use the SELECT .. FOR UPDATE statement.

    LOCK TABLE provides multiple levels of locks. However, if you do not fully understand the database principles and services, misuse of table locks may trigger deadlocks, causing the instance to be unavailable.

  • Do not use subqueries in the SELECT target columns. Otherwise, the plan may fail to be pushed down to DNs for execution, affecting the execution performance.
  • Use UNION ALL instead of UNION, and consider data deduplication if necessary.

    UNION ALL does not deduplicate data and does not require sorting operations. Therefore, UNION ALL is faster than UNION.

    If deduplication is not required, UNION ALL is preferred.

  • Do not use count(col) instead of count(*) to count the total number of records in a table. count(*) counts the NULL value (actual rows) while count(col) does not.
  • While executing count(col), the number of NULL record rows is counted as 0. While executing sum(col), NULL is returned if all records are NULL. If not all the records are NULL, the number of NULL record rows is counted as 0.
  • To count multiple columns using count(), column names must be enclosed in parentheses. For example, count ((col1, col2, col3)). When multiple columns are used to count the number of NULL record rows, a row is counted even if all the selected columns are NULL. The result is the same as that when count(*) is executed.
  • Null records are not counted when count(distinct col) is used to calculate the number of non-NULL columns that are not repeated.
  • If all statistical columns are NULL when count(distinct (col1,col2,...)) is used to count the number of unique values in multiple columns, Null records are also counted, and the records are considered the same.
  • Use the connection operator || to replace the concat function for string connection because the execution plan generated by the concat function cannot be pushed down to disks. As a result, the query performance severely deteriorates.
  • If an expression in(val1, val2, val3...) contains a large number of columns, you are advised to replace it with in (values(val1), (val2), (val3).... The optimizer will automatically convert the IN constraint into a non-correlated subquery to improve the query performance.
  • Do not frequently use count() to obtain the number of rows in a large table. This operation consumes a large number of resources and affects the execution efficiency of parallel jobs.

    If you do not need the real-time row statistics, run the following statement to obtain the number of rows in the table:

    SELECT reltuples FROM pg_class WHERE relname = 'tablename';

    The number of rows recorded in pg_class is updated only after ANALYZE is executed on the table.

    Currently, ANALYZE is triggered in either of the following conditions:

    • The service sends an ANALYZE statement. For example:

      -- Analyze all tables in the connection library.

      ANALYZE;

      -- Analyze the specified table.

      ANALYZE tablename;

    • This event is triggered when the number of rows added or deleted at a specified interval or in a table reaches a specified value by using AUTO VACUUM. The interval and addition/deletion ratio can be configured through the GUC parameters.