Help Center/ DataArts Fabric/ Developer Guide/ GUC Parameter/ Query Planning/ Other Optimizer Options

Updated on 2025-09-18 GMT+08:00

View PDF

Other Optimizer Options

default_statistics_target

Definition: Sets a default statistics target for tables whose column targets are not specified using ALTER TABLE SET STATISTICS. A positive value represents the sample count for statistical information, while a negative value sets the statistics target as a percentage (for example, -5 represents 5%). During sampling, default_statistics_target x 300 is used as the random sampling size. For example, with the default value of 100, 30,000 pages are randomly read, and 30,000 data records are sampled for analysis.

Range: Floating-point number from –100 to 10000.

Positive values larger than the default increase the time required for ANALYZE, but may enhance the optimizer's estimation accuracy.
Adjusting this parameter may risk performance degradation. If a query degrades, consider:
- Restoring the default statistics.
- Using plan hints to revert to the previous query plan.
When this GUC parameter is set to a negative value, if the computed sample size is greater than or equal to 2% of the total data volume and the table contains fewer than 1,600,000 rows, the ANALYZE duration may exceed that of the default GUC parameter setting.
AUTOANALYZE does not support customizing the sampling size for temporary tables. The default value is used instead.
If statistics are forcibly calculated in-memory, the sampling size is constrained by the maintenance_work_mem parameter.

Default Value: 100

from_collapse_limit

Definition: Determines whether the optimizer merges subqueries into upper-layer queries based on the number of items in the generated FROM list. If the number of items in the FROM list is less than or equal to the value of this parameter, the optimizer merges subqueries into upper-layer queries.

Range: an integer ranging from 1 to INT_MAX

If the value is smaller than the default value, the planning time is reduced, but a poor execution plan may be generated.

Default Value: 8

join_collapse_limit

Definition: Specifies whether the optimizer rewrites JOIN constructs (except FULL JOINS) into lists of FROM items based on the number of items in the result list.

Range: an integer ranging from 1 to INT_MAX

If this bit is set to 1, any JOIN rearrangement will be avoided. In this way, the join order specified in the query is the actual join order. The query optimizer does not always select the optimal join order. Advanced users can temporarily set this variable to 1 and then specify the required join order.
Smaller values reduce planning time but degrade the quality of execution plans.

Default Value: 8

enable_bloom_filter

Definition: Specifies whether the BloomFilter optimization can be used.

Range: Boolean

on: indicates that the BloomFilter optimization can be used.
off: indicates that the BloomFilter optimization is not allowed.

Default Value: on

Application scenario: Bloom Filter is triggered by HASH JOIN of the same thread on the foreign table that contains HDFS internal and foreign tables or column-store tables.

Constraints

The JOIN type can only be INNER JOIN, SEMI JOIN, RIGHT JOIN, RIGHT SEMI JOIN, RIGHT ANTI JOIN, or RIGHT ANTI FULL JOIN.
Join condition on the inner table side of JOIN: The join condition cannot be an expression for HDFS inner and foreign tables. The join condition can be an expression for column-store tables, but only for non-JOIN layer calculation.
The association condition of the JOIN foreign table must be simple column association.
If the association conditions of both the internal and external tables of JOIN are simple column association, more than 1/3 of the data can be removed from the estimation at the plan layer (only for HDFS internal and external tables).
JOIN cannot contain null value associations.
Data type:
- The HDFS internal and external table field types support SMALLINT, INTEGER, BIGINT, REAL/FLOAT4, DOUBLE PRECISION/FLOAT8, CHAR(n)/CHARACTER(n)/NCHAR(n), VARCHAR(n)/CHARACTER VARYING(n), CLOB, and TEXT.
- The column-store table supports the SMALLINT, INTEGER, BIGINT, OID, "char", CHAR(n)/CHARACTER(n)/NCHAR(n), VARCHAR(n)/CHARACTER VARYING(n), NVARCHAR2(n), CLOB, TEXT, DATE, TIME, TIMESTAMP, and TIMESTAMPTZ field types. The collation of the character type must be C.

enable_extrapolation_stats

Definition: Specifies whether to use the extrapolation logic based on historical statistics. This logic can increase the probability of accurate estimation for tables whose statistics are not collected in time. However, there is a possibility that the estimation is too large due to incorrect inference.

Range: Boolean

on: indicates that the extrapolation logic can be used based on historical statistics.
off: indicates that the extrapolation logic cannot be used based on historical statistics.

Default Value: off

query_dop

Definition: Specifies the user-defined DOP for queries.

Range: an integer ranging from –64 to 64.

[1, 64]: enables the fixed SMP function. The system uses the fixed DOP.

0: SMP adaptation is enabled. The system dynamically selects the optimal parallelism between [1, 8]s (x86 platform) and between [1, 64]s (Kunpeng platform) for each query based on the resource usage and query plans.

[–64, –1]: SMP adaptation is enabled and the maximum DOP is limited.

Default Value: 1

smp_thread_cost

Definition: Specifies the minimum estimated cost of each SMP thread when planning the thread DOP in SMP adaptive scenarios. If the estimated cost of an operator is low but the DOP is high, the optimizer automatically reduces the DOP of the operator when the estimated cost of each SMP thread is less than the threshold. This prevents system resources from being wasted by enabling a high DOP for low-cost operators.

Range: a floating point number ranging from 0 to 10000

0: The estimated cost threshold of the SMP thread is disabled. In this case, the DOP is not affected by the operator cost.

(0, 10000]: minimum estimated cost threshold of SMP threads. A larger value indicates that the DOP of low-cost operators tends to decrease.

Default Value: 1000

If this parameter is not set to 0, the estimated cost of the operator is too small, resulting in a low SMP DOP. When the preceding situation occurs, set this parameter to 0 to increase the DOP.

plan_mode_seed

Definition: Specifies the optimal execution plan used by the optimizer to estimate costs using the dynamic planning algorithm or generate random plans. This parameter is a commissioning parameter.

Type: USERSET

Range: an integer ranging from –1 to 2147483647

0: optimal execution plan whose cost is estimated using the dynamic planning algorithm.
–1: The optimizer randomly generates a random integer ranging from 1 to 2147483647 and generates a random execution plan based on the random number.
[1, 2147483647]: The optimizer generates a random execution plan based on the specified random number.

Default Value: 0

If this parameter is set to the random execution plan mode, the optimizer generates different random execution plans. The execution plan may not be the optimal one. Therefore, in random plan mode, the query performance is affected. You are advised to retain the default value 0 during normal service operations or O&M.
If this parameter is not set to 0, the specified plan hint does not take effect.

fq_query_vdn

Definition: Specifies the number of actors started by a task.

Range: an integer in the range [0, 256]

0: number of adaptive actors. The number of actors is dynamically selected based on the scanning and writing costs.
1: A single actor is used, and the execution mode is degraded to the single CN mode.
2–256: Use a specified number of actors for execution.

Default Value: 2

fq_dynamic_actor_cost_threshold

Definition: Specifies the single-CN cost threshold when fq_query_vdn is set to 0. An actor is started when the threshold is exceeded.

Range: a floating point number ranging from 1 to 1e10

Default Value: 200000

During adaptation, the number of selected actors and the cost are not in a simple linear relationship. Instead, they are in a process of gradient descent. As the cost increases, the number of selected actors tends to 256.

Parent topic: Query Planning

Previous topic: Optimizer Method Configuration

Next topic: Default Settings of Client Connection