Updated on 2024-11-05 GMT+08:00

Tuning Database Parameters

To ensure high performance of the database, you are advised to configure GUC parameters based on available resources and the actual workloads. This section describes some of the common parameters and the recommended configurations for them. For more details, see Configuring GUC Parameters.

Parameters Related to Database Memory

Table 1 Parameters related to database memory

GUC Parameter

Description

Configuration Suggestion

max_process_memory

Specifies the maximum physical memory available to a single CN/DN.

  • On DNs, the value of this parameter is determined based on the server's physical memory and the number of DNs deployed on a single node. Parameter value = (Physical memory – vm.min_free_kbytes) x 0.8/(n + Number of primary DNs). This parameter aims to ensure system reliability, preventing node OOM caused by increasing memory usage. vm.min_free_kbytes indicates OS memory reserved for kernels to receive and send data. Its value is at least 5% of the total memory. That is, max_process_memory = Physical memory x 0.8/ (n + Number of primary DNs). If the cluster scale (number of nodes in the cluster) is smaller than 256, n=1; if the cluster scale is larger than 256 and smaller than 512, n=2; if the cluster scale is larger than 512, n=3.
  • Set this parameter on CNs to the same value as that on DNs.
  • RAM is the maximum memory allocated to the cluster.

shared_buffers

Specifies the size of the shared memory used by GaussDB(DWS). If the value of this parameter is increased, GaussDB(DWS) requires more System V shared memory than the default system setting.

It is recommended that shared_buffers be set to a value less than 40% of the memory. Set it to a large value for row-store tables and a small value for column-store tables. Set this parameter to a large value for row storage and a small value for column storage. For column-store tables: shared_buffers = (Memory of a single server/Number of DNs on the single server) x 0.4 x 0.25

If you want to increase the value of shared_buffers, you also need to increase the value of checkpoint_segments, because a longer period of time is required to write a large amount of new or changed data.

cstore_buffers

Specifies the size of the shared buffer used by column-store tables and column-store tables (ORC, Parquet, and CarbonData) of OBS and HDFS foreign tables.

Column-store tables use the shared buffer specified by cstore_buffers instead of that specified by shared_buffers. When column-store tables are mainly used, reduce the value of shared_buffers and increase that of cstore_buffers.

Use cstore_buffers to specify the cache of ORC, Parquet, or CarbonData metadata and data for OBS or HDFS foreign tables. The metadata cache size should be 1/4 of cstore_buffers and not exceed 2 GB. The remaining cache is shared by column-store data and foreign table column-store data.

work_mem

Specifies the size of the memory used by internal sequential operations and the Hash table before data is written into temporary disk files.

The default value is 512 MB for small-scale memory (max_process_memory is less than 30 GB) and 2 GB for large-scale memory (max_process_memory is greater than or equal to 30 GB).

When the specified physical memory is insufficient, work_mem determines whether to write additional operator calculation data into temporary tables based on query characteristics and concurrency. This reduces performance by five to ten times and increases query response times from seconds to minutes.

  • In complex serial query scenarios, each query requires five to ten associated operations. Set work_mem using the following formula: work_mem = 50% of the memory/10.
  • In simple serial query scenarios, each query requires two to five associated operations. Set work_mem using the following formula: work_mem = 50% of the memory/5.
  • For concurrent queries, use the formula: work_mem = work_mem in serialized scenario/Number of concurrent SQL statements.

maintenance_work_mem

Specifies the maximum size of memory used for maintenance operations, involving VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY.

If you set this parameter to the value of work_mem, database dump files can be cleaned up and restored more efficiently. In a database session, only one maintenance operation can be performed at a time. Maintenance is usually performed when there are not much sessions.

When the automatic cleanup process is running, up to autovacuum_max_workers times of the memory will be allocated. In this case, set maintenance_work_mem to a value greater than or equal to that of work_mem.

Parameters Related to Queue Concurrency in Databases

GUC Parameter

Description

Configuration Suggestion

max_active_statements (global concurrent queue)

Controls the maximum number of concurrent jobs on a single CN.

All common users' jobs are subject to this threshold, regardless of their complexity. When the number of concurrent jobs reaches the specified threshold, the excess jobs have to wait in a queue. Administrator's jobs are exempt from this limit.

Set the value of this parameter based on system resources, such as CPU, I/O, and memory resources, to ensure that the system resources can be fully utilized and the system will not be crashed due to excessive concurrent jobs.

parctl_min_cost (local concurrent queue)

Controls the maximum number of concurrent jobs within the same resource pool on a single CN.

The number of concurrent complex jobs are controlled based on their cost.

When tuning the max_active_statements parameter (global concurrent queue), pay attention to the following:

  • If max_active_statements is set to -1, which indicates that global concurrency is not limited, users may be disconnected in a high concurrency scenario.
  • In a point query scenario, set max_active_statements to 100.
  • In an analytical query scenario, set max_active_statements to the number of CPU cores divided by the number of DNs. Generally, its value ranges from 5 to 8.

Database Communication Parameters

By default, nodes in a database cluster communicate using the TCP proxy communication library.

Table 2 Database communication parameters

GUC Parameter

Description

Configuration Suggestion

comm_quota_size

comm_quota_size controls the size of data transmitted every time in each flow channel. Its default value is 1M.

In a high concurrency scenario, you can increase its value to improve communication performance, but doing so consumes more memory. Optimize this parameter as needed. If you query the pg_total_memory_detail view of a DN and find that the memory used by the communication layer has reached the threshold of comm_usable_memory, set comm_quota_size to a small value, such as 512K.

comm_usable_memory

comm_usable_memory controls the memory on a DN that can be used for database communication.

The value of this parameter is only used for memory flow control. The default flow control value is 1 MB. If the memory usage exceeds half of the parameter value, the flow control value will be automatically changed to 0.5 MB. If only 20% of the memory specified by the parameter is available, the flow control value will be changed to the allowed minimum, 8 KB.

Database Connection Parameters

Table 3 Database connection parameters

GUC Parameter

Description

Configuration Suggestion

max_connections

Specifies the maximum number of concurrent connections to the database. This parameter affects the concurrent processing capability of the cluster.

Retain the default value of this parameter on CNs. Set this parameter on DNs to a value calculated using this formula: Number of CNs x Value of this parameter on a CN.

If the value of this parameter is increased, GaussDB(DWS) may require more System V shared memory or semaphore, which may exceed the default maximum value of the OS. In this case, modify the value as needed.

max_prepared_transactions

Specifies the maximum number of transactions that can stay in the prepared state simultaneously. If the value of this parameter is increased, GaussDB(DWS) requires more System V shared memory than the default system setting.

The value of max_connections is related to max_prepared_transactions. Before configuring max_connections, ensure that the value of max_prepared_transactions is greater than or equal to that of max_connections. In this way, each session has a prepared transaction in the waiting state.

session_timeout

Specifies the maximum duration a database connection can stay idle before it is automatically disconnected.

The value can be an integer in the range 0 to 86400. The minimum unit is second (s). The value 0 disables this timeout mechanism. Generally, you are advised not to set this parameter to 0.

Other Performance-related Parameters

Table 4 Other performance-related parameters

GUC Parameter

Description

Configuration Suggestion

enable_dynamic_workload

Specifies whether to enable dynamic load management.

Dynamic load management refers to the automatic queue control of complex queries based on user loads in a database. This fine-tunes system parameters without manual adjustment.

This parameter is enabled by default. Notes:

  • Simple query jobs (which are estimated to require less than 32 MB memory) and non-DML statements (statements other than INSERT, UPDATE, DELETE, and SELECT) have no adaptive load restrictions. Control the upper memory limits for them on a single CN using max_active_statements.
  • In adaptive load scenarios, the value cannot be increased. If you increase it, memory cannot be controlled for certain statements, such as statements that have not been analyzed.
  • Reduce concurrency in the following scenarios, because high concurrency may lead to uncontrollable memory usage.
    • A single tuple occupies excessive memory, for example, a base table contains a column more than 1 MB wide.
    • A query is fully pushed down.
    • A statement occupies a large amount of memory on the CN, for example, a statement that cannot be pushed down or a cursor withholding statement.
    • An execution plan creates a hash table based on the hash join operator, and the table has many duplicate values and occupies a large amount of memory.
    • UDFs are used, which occupy a large amount of memory.
  • When configuring this parameter, you can set query_dop to 0 (adaptive). In this case, the system dynamically selects the optimal degree of parallelism (DOP) for each query based on resource usage and the execution plan. The enable_dynamic_workload parameter supports the dynamic memory allocation.

bulk_write_ring_size

Specifies the size of a ring buffer used for parallel data import.

This parameter affects the database import performance. You are advised to increase the value of this parameter on DNs when a large amount of data is to be imported.

The default value is 2GB.

data_replicate_buffer_size

Specifies the memory used by queues when the sender sends data pages to the receiver.

The value of this parameter affects the buffer size for data replication between the primary and standby servers.

The default value is 16 MB for a CN and 128 MB for a DN. If the server memory is 256 GB, you can increase the value to 512 MB.