Updated on 2024-11-05 GMT+08:00

GaussDB(DWS) Overall Development and Design Specifications

To improve data development efficiency, reduce resource consumption, and minimize business disruptions, developers should follow the development and design specifications listed in Table 1 before using GaussDB(DWS) for data development.

These design specifications are divided into three levels of importance: Caution, Warning, and Danger. Each level has the following meanings:

  • Caution: Recommended design standards that users should follow. Failure to comply may affect business performance and increase O&M difficulties.
  • Warning: Compliance with these rules ensures efficient business operations. Violation of these rules may result in business disruptions and errors.
  • Danger: Violation of these rules may result in data deletion, significant system changes, or even system failures.
Table 1 Overall development design specifications

Scenario

No.

Specifications

Level

Impact of Non-Compliance

User and connection management

1

Avoid having all business operations run under a single database user. Instead, plan different database users according to the business modules.

Caution

When abnormal business or user operations cause problems in the entire cluster, it is difficult to quickly isolate and control them.

2

For better access control of different business modules, it is better to use multiple users and permissions instead of depending on the system administrator user to run business operations. For details, see GaussDB(DWS) User and Permissions Management.

Caution

Administrators with excessive user privileges are difficult to manage.

3

You are not advised to connect services to a single CN. Instead, configure load balancing to ensure that connections to each CN are balanced.

Warning

Connections to CNs are unbalanced, load skew slows down the overall performance, and services cannot be automatically recovered when CNs are faulty.

4

After connecting to the database and completing required operations, close the database connection in a timely manner to prevent idle connections from continuously occupying resources and consuming connections and public resources.

Warning

Having too many idle connections consumes a lot of public resources, making it impossible for normal business operations to establish connections and run.

5

In the scenario where the database connection pool is used, after the database GUC parameters are set using the SET statement in the service, the parameters must be restored using the RESET statement before the connection pool is returned.

Warning

Jobs that use connection pools interfere with each other.

Object design

6

You are not advised to specify a user-defined tablespace when creating an ordinary table.

Warning

Custom table spaces occupy storage space that is not controlled, which can lead to space and performance issues.

7

Do not specify the COMPRESS compression attribute when creating a row-store table.

Warning

The row-store compression feature is unstable and has been deprecated.

8

You are not advised to frequently perform small batch real-time data insertion for ordinary column-store tables.

Warning

Real-time data insertion for column-store causes severe small CU expansion, triggering continuous space, resource, and performance issues.

9

When creating a hash-distributed table object, ensure that data is evenly distributed. (For a table with more than 10 GB data, the skew rate must be less than 10%.)

Warning

Data distribution skewness leads to calculation skewness, resulting in space and performance issues.

10

When creating a table object for REPLICATION distribution, ensure that the number of rows in the table is less than 1 million.

Caution

Replicating large amounts of table data can increase storage space risks.

11

When creating an HStore table, ensure that the database GUC parameter settings meet the following requirements:

Warning

The delta table of HStore expands rapidly, causing sustained performance degradation.

12

For a large table (with more than 50 million rows of data) that contains the time field, the table must be designed as a partition table and the partition interval must be properly designed based on the query characteristics.

Warning

Querying and cleaning specific time periods for partitioned tables is much more efficient than for non-partitioned tables.

13

For a table where a large amount of data needs to be added, deleted, or modified, it is recommended that the number of indexes be less than or equal to three. The maximum number of indexes is five.

Suggestion

Excessive indexing can slow down data management tasks like adding, deleting, and modifying data. It can even cause xlog buildup and cluster malfunctions in extreme cases.

Operation regulations

14

In scenarios where the client explicitly starts a transaction or manually disables "autocommit", you must manually execute a commit command to commit the transaction.

Warning

Long transactions caused by uncommitted transactions block other locked businesses and recovery operations.

15

You are advised to optimize the statements that take longer than 30 minutes to execute.

Caution

Issues with business efficiency and performance.

16

Avoid business operations that take longer than 2 hours to execute to prevent issues such as long transactions and long locks.

Warning

Long transactions block other locked businesses and recovery operations, affecting business performance.

17

Be cautious when using DROP OBJECT (e.g., DATABASE, USER/ROLE, SCHEMA, TABLE, VIEW) as it may cause data loss, especially with CASCADE deletions. Always back up data before proceeding.

Danger

The data is deleted abnormally and cannot be recovered quickly.

18

Avoid executing ALTER (add, delete, modify columns, DROP PARTITION, etc.) and TRUNCATE operations during business peak hours to prevent long SQL blocking ALTER, TRUNCATE operations, or SQL business being blocked by ALTER or TRUNCATE.

Warning

ALTER and TRUNCATE have high lock levels, which can block other concurrent businesses and be blocked by other businesses, causing business performance issues.

19

Do not perform the CREATE INDEX and REINDEX operations on large tables during peak hours.

Warning

CREATE INDEX and REINDEX operations can block data import and take a long time to execute on large tables, causing prolonged data import.

20

Avoid SQL statements that cannot be pushed down (indicated by REMOTE_XXX_QUERY in the EXPLAIN output).

Caution

CN bottleneck, causing business performance issues.

21

Refrain from frequent COUNT queries on large row-store tables (over 10 million rows).

Caution

High I/O usage, causing business performance issues.

22

Limit the number of UNION/UNION ALL branches in a single SQL statement to 50, and associate no more than 25 non-replicated tables.

Caution

High resource consumption, causing business performance issues.

23

Prevent Cartesian products by ensuring proper association conditions during large table joins.

Caution

High resource consumption, causing business performance issues.

24

In multi-table association queries, keep the number of streams in a single statement below 100.

Caution

High CPU usage, causing business performance issues.

25

Use the WITH RECURSIVE statement cautiously. Define data repetition limits and termination conditions to ensure recursion ends as expected.

Caution

Infinite loops, causing business performance issues.

26

Avoid using UPDATE or DELETE to modify or delete a large volume of data and use TRUNCATE PARTITION or DROP PARTITION instead.

Caution

Dirty pages, causing business performance issues.

27

Do not concurrently update the same column-store table using the UPDATE and UPSERT statements.

Caution

Lock issues, causing business performance issues.

28

Avoid using stored procedures, especially those with complex structures and multiple layers of nesting.

Caution

Low maintenance efficiency and high fault locating costs.

O&M management

29

If you need to adjust database parameters at the cluster level, contact Huawei for a risk assessment before making any changes.

Danger

Adjusting parameters at the cluster level can impact customer services, so it is important to conduct a risk assessment before making any changes.

30

If more than 10% of a table's records are added or modified at once, explicitly execute the ANALYZE operation.

Warning

Business performance stability issues.

31

Execute VACUUM FULL on tables with a high dirty page rate and small CU ratio exceeding 25% at regular intervals. Perform VACUUM FULL on common tables during low-traffic hours and on system tables when they are offline.

Warning

Object expansion, causing business performance issues.

32

Alarms of "major" level or above in the database must be promptly handled and cleared.

Warning

Long-term cluster stability.

33

Before batch business rollout, it is necessary to conduct large-scale testing and validation on the test cluster and notify technical support personnel to ensure business security.

Caution

Business launch evaluation and assurance.