Optimizing Database Memory Parameters

The performance of complex query statements is highly dependent on the database memory parameters. The database memory parameters include the control parameter for logical memory management and the parameter that determines whether execution operators are spilled to disks.

Parameter for Logical Memory Management

max_process_memory is a parameter used for logical memory management. It specifies the maximum available memory on each DN.

Use the following formula to calculate the available memory for job execution:

max_process_memory – Shared memory (including shared_buffers) – cstore_buffers

Therefore, the available memory for job execution depends on shared_buffers and cstore_buffers.

For HDFS tables, set these two parameters to a small value, such as 1 GB.

Logical memory management contains views that are used for displaying the used memory and peak information in each database block of a DN. You can connect to a DN and query information about the memory usage on that DN using pg_total_memory_detail. Alternatively, you can connect to a CN and query information about the memory usage on all the DNs using pgxc_total_memory_detail. (In a large cluster, much output information needs to be processed, and the query is slow.)

When the specified physical memory is insufficient, work_mem determines whether to write additional operator calculation data into temporary tables based on query characteristics and concurrency. This reduces performance by five to ten times and prolongs the query response time from seconds to minutes.

For non-concurrent complex serial queries, each query requires five to ten associated operations. Configure work_mem using the following formula: work_mem = 50% of the memory/10.
For non-concurrent simple serial queries, each query requires two to five associated operations. Configure work_mem using the following formula: work_mem = 50% of the memory/5.
For concurrent queries, configure work_mem using the following formula: work_mem = work_mem for serial queries/Number of concurrent SQL statements.

Parameter Determining Whether to Spill Execution Operators to Disks

work_mem is used to set the used memory threshold. Execution operators that can be spilled to disks will be spilled when the used memory exceeds the threshold. Such execution operators include Hash(VecHashJoin), Agg(VecAgg), Sort(VecSort), Material(VecMaterial), SetOp(VecSetOp), and WindowAgg(VecWindowAgg). They can be vectorized or non-vectorized. This parameter ensures concurrent throughput and the performance of a single query job. Therefore, you need to optimize the parameter based on the output of Explain Performance.