Help Center/ DataArts Fabric/ Developer Guide/ GUC Parameter/ Query Planning/ Optimizer Method Configuration

Updated on 2025-12-19 GMT+08:00

View PDF

Optimizer Method Configuration

The following configuration parameters provide a basic way to influence the query optimizer's choice of execution plans. If the default plan chosen by the optimizer for a specific query is not optimal, these configuration parameters can be used to force the optimizer to select a different plan as a temporary solution. Better methods include tuning the optimizer cost constants, manually running ANALYZE, and increasing the value of the default_statistics_target parameter.

enable_bitmapscan

Definition: Controls whether the optimizer uses bitmap scan planning.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_hashagg

Definition: Determines if the optimizer uses hash aggregation planning.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_mixedagg

Definition: Controls whether the optimizer uses mixed aggregation planning. (This parameter is only supported by clusters of version 8.2.0 or later.)

Range: Boolean

on indicates that a Mixed Agg query plan is generated for the Grouping Sets statement (including Rollup or Cube) that meets certain conditions.
off: disabled.

Default Value: on

In large-scale scenarios (for example, single DN table data exceeding 100 GB), using Mixed Agg can enhance query execution performance.

However, it is not supported under the following conditions:

The GROUP BY clause contains columns with unsupported hashable data types.
Aggregate functions include DISTINCT or ORDER BY operations.
The GROUPING SETS clause does not contain empty groupings.

enable_hashjoin

Definition: Controls whether the query optimizer uses the Bitmap Scan plan type.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_indexscan

Definition: Controls whether the optimizer uses index scan plans.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_indexonlyscan

Definition: Determines if the optimizer uses index-only scan plans.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_material

Definition: Controls the optimizer's use of materialization. While complete elimination of materialization is impossible, disabling this parameter prevents the optimizer from inserting materialization nodes.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_mergejoin

Definition: Controls whether the optimizer uses merge join plans.

Range: Boolean

on: enabled.
off: disabled.

Default Value: off

enable_nestloop

Definition: Controls whether the query optimizer uses the nested-loop join plan type to fully scan inner tables. It is impossible to eliminate nested loop joins. However, disabling this variable will make the optimizer prefer other methods when they exist.

Range: Boolean

on: enabled.
off: disabled.

Default Value: off

left_join_estimation_enhancement

Definition: Controls whether optimized row count estimates are used for left joins. This parameter is only supported by clusters of version 8.3.0.100 or later.

Range: Boolean

on: enabled.
off: disabled.

Default Value: off

enable_seqscan

Definition: Controls whether the optimizer uses sequential scan plans. While it is not possible to eliminate sequential scans completely, disabling this parameter prompts the optimizer to favor other approaches when applicable.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_sort

Definition: Controls the optimizer's sorting steps. Explicit sorts cannot be fully removed, but disabling this parameter allows the optimizer to prefer alternatives when they exist.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

max_opt_sort_rows

Definition: Controls the maximum number of rows (limit + offset) in ORDER BY clauses that can be optimized. This parameter is only supported by clusters of version 8.3.0 or later.

Range: an integer ranging from 0 to INT_MAX

A value of 0 disables this parameter.
Other values enable optimization when limit + offset is below the threshold. Otherwise, optimization is disabled. Optimization reduces execution time but may increase memory usage.

Default Value: 0

enable_stream_concurrent_update

Definition: Controls whether the optimizer generates stream plans for concurrent updates, contingent on the enable_stream_operator parameter.

Range: Boolean

on: allows the optimizer to generate stream plans for UPDATE statements.
off: allows the optimizer to generate only non-stream plans for UPDATE statements.

Default Value: on

enable_ctescan

Definition: Controls whether plans support common table expression (CTE) scans.

Range: Boolean

on: enables CTE scan support.
off: disables CTE scan support.

Default Value: on

enable_stream_operator

Definition: Controls the optimizer's use of stream operators.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_stream_recursive

Definition: Determines if WITH RECURSIVE queries are pushed down for distributed execution on DNs.

Range: Boolean

on: enables pushdown for WITH RECURSIVE queries.
off: disables pushdown for WITH RECURSIVE queries.

Default Value: on

enable_value_redistribute

Definition: Controls whether value redistribution optimization is applied to RANK, DENSE_RANK, and ROW_NUMBER functions without PARTITION BY clauses in cluster versions 8.2.0 or later.

Range: Boolean

on: enables value redistribution optimization.
off: disables value redistribution optimization.

Default Value: off

max_recursive_times

Definition: Sets the maximum iteration count for WITH RECURSIVE queries.

Range: an integer ranging from 0 to INT_MAX

Default Value: 200

enable_vector_engine

Definition: Controls the optimizer's use of vectorized execution engines.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_broadcast

Definition: Controls whether the optimizer's consideration of broadcast distribution during stream cost estimation.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_redistribute

Definition: Determines if the optimizer's evaluation of local and split redistribute distributions during stream cost estimation. This parameter is only supported by clusters of version 8.2.1.300 or later.

Range: Boolean

on: enabled.
off: disabled.

Default Value: on

enable_change_hjcost

Definition: Adjusts Hash Join cost estimation by excluding inner table runtime costs from the Hash Join node's runtime cost calculation. This favors selecting smaller tables with higher runtime costs as inner tables.

Range: Boolean

on: enabled.
off: disabled.

Default Value: off

enable_fstream

Definition: Controls the optimizer's use of stream operators when delivering statements. This parameter is exclusively applicable to HDFS foreign tables.

This parameter has now been deprecated. You can set this parameter to on for forward compatibility, but the setting will have no effect.

Range: Boolean

on: enabled.
off: disabled.

Default Value: off

enable_hashfilter

Definition: Controls whether hash filters can be generated for plans involving replicated tables (including dual and constant tables). This parameter is only supported by clusters of version 8.2.0 or later.

Range: Boolean

on: allows hash filter generation.
off: prohibits hash filter generation under all circumstances.

Default Value: on

best_agg_plan

Definition: For Agg operations under stream, the optimizer generates three plans:

hashagg+gather(redistribute)+hashagg
redistribute+hashagg(+gather)
hashagg+redistribute+hashagg(+gather)

This parameter controls which hashagg plan the optimizer generates.

Range: 0, 1, 2, and 3.

The value 1 forces the generation of the first plan.
The value 2 forces the generation of the second plan if the group by columns can be redistributed. Otherwise, it generates the first plan.
The value 3 forces the generation of the third plan if the group by columns can be redistributed. Otherwise, it generates the first plan.
The value 0 allows the optimizer to select the best plan based on the estimated costs of the above three plans.

Default Value: 0

turbo_engine_version

Definition: For tables created with the turbo storage format (with the enable_turbo_store parameter set to on in the table attributes), and when the query does not involve merge join or sort agg operators, the executor can utilize the turbo execution engine, significantly improving performance.

Range: 0, 1, 2, and 3.

The value 0 disables the turbo execution engine.
The value 1 enables the turbo execution engine only for single-table aggregate queries.
The value 2 enables the turbo execution engine only for single-table aggregate or multi-table join queries.
The value 3 accelerates most common operators using the turbo execution engine, excluding operators like merge join and sort agg. When the data volume is large and turbo_engine_version is set to 3, the occurrence of merge join and sort agg operators is relatively rare, so turbo execution engine acceleration can be achieved for almost SQL statements.

Default Value: 3

You are advised not to enable the turbo execution engine in cross-VW scenarios.

spill_compression

Definition: Controls the compression algorithm used when executor operators spill data to disk due to insufficient memory. This parameter is only supported by clusters of version 9.1.0.100 or later.

Range: enumerated values

'lz4' indicates that the lz4 compression algorithm is used, which provides better performance for scenarios with smaller spill volumes, but requires more storage space.
'zstd' indicates that the zstd compression algorithm is used, which provides better performance for scenarios with larger spill volumes where I/O is the main bottleneck, and requires approximately 2/3 of the storage space used by lz4.

Default Value: 'lz4'

index_selectivity_cost

Definition: Controls the cost calculation for cbtree during columnar table index scans (when selectivity > 0.001). This parameter is only supported by clusters of version 8.2.1.1008.2.1.1 or later.

Range: a floating point number, which can be –1 or ranges from 0 to 1000.

A value of 0 ignores the index selectivity threshold of 0.001.
A value of -1 applies disable_cost.
Any other value serves as a coefficient for cbtree cost calculations.

Default Value: –1

index_cost_limit

Definition: Sets the threshold at which cbtree cost calculations become inactive during columnar table index scans. This parameter is only supported by clusters of version 8.2.1.1008.2.1.1 or later.

Range: an integer ranging from 0 to 2147483647

A value of 0 disables this parameter.
Any other value ensures that tables with fewer rows than this limit ignore the index selectivity threshold of 0.001.

Default Value: 0

volatile_shipping_version

Definition: Controls the scope of volatile function pushdown execution.

Range: 0, 1, 2, and 3.

Value 3 extends support from value 2 by enabling pushdown for InlineCTEs referenced once and UPSERT operations on replicated tables involving volatile functions.
Value 2 builds on value 1 by allowing pushdown for target columns containing volatile functions in replicated CTE results.
Value 1 enables full pushdown for nextval, uuid_generate_v1, sys_guid, and uuid functions appearing in statement target columns.
Value 0 supports full pushdown for random-like functions and partial pushdown for nextval and uuid_generate_v1 in INSERT statements with simple queries.

Default Value: 3

agg_redistribute_enhancement

Definition: During the Agg operation, if multiple group by columns are contained and none of them is a distribution column, a group by column is selected for redistribution. This parameter controls the policy for selecting a redistribution column.

Range: Boolean

on indicates that the redistribution column with the largest estimated distinct value is selected as the redistribution column.
off indicates that the first column that can be redistributed is selected as the redistribution column.

Default Value: off

enable_valuepartition_pruning

Definition: Determines whether static or dynamic optimization is applied to DFS partitioned tables.

Range: Boolean

on: enables static/dynamic optimization for DFS partitioned tables.
off: disables static/dynamic optimization for DFS partitioned tables.

Default Value: on

stream_multiple

Definition: Sets a weighting factor used by the optimizer when calculating the cost of stream operators.

The final stream cost is multiplied by this weight based on the existing cost model.

Range: a floating point number ranging from 0 to 10000

Default Value: 1

This parameter only applies to Redistribute and Broadcast types of stream operators.

qrw_inlist2join_optmode

Definition: Controls whether inlist-to-join query rewriting is enabled.

Range: String

disable: disables inlist-to-join query rewriting.
cost_base: enables cost-based inlist-to-join query rewriting.
rule_base: forces rule-based inlist-to-join query rewriting.
Any positive integer: Sets a threshold for inlist-to-join query rewriting; if the number of elements in the list exceeds this value, the rewriting is performed.

Default Value: disable

enable_inlist_hashing

Definition: Controls whether inlist hashing optimization is enabled. This parameter is only supported by clusters of version 9.1.0 or later.

Range: Boolean

on: enables inlist hashing optimization.
off: disables inlist hashing optimization.

Default Value: on

setop_optmode

Definition: Controls whether deduplication is performed for individual query branches in set operations (UNION/EXCEPT/INTERSECT) without the ALL option. This parameter is only supported by clusters of version 8.2.0 or later.

Range: enumerated values

disable: Query branches do not perform deduplication.
force: Forces query branches to perform deduplication.
cost: The optimizer selects the execution method with the lower cost between deduplication and non-deduplication.

Default Value: cost

The parameter configurations take effect only when the SQL execution plan meets these conditions:

The UNION/EXCEPT/INTERSECT operation in the SQL statement does not include the ALL option.
Each query branch involved in the set operation performs data redistribution before executing the set operation.

skew_option

Definition: Controls whether to use the optimization policy.

Range: String

off: disables the policy.
normal: uses the aggressive policy. For scenarios where skew occurrence is uncertain, assumes skew exists and performs corresponding optimizations.
lazy: uses the conservative policy. For scenarios where skew occurrence is uncertain, assumes no skew exists and performs no optimizations.

Default Value: normal

enable_expr_skew_optimization

Definition: Controls whether expression statistics are used in skew optimization strategies. This parameter is only supported by clusters of version 9.1.0.100 or later.

Range: Boolean

on: enables the use of expression statistics to detect skew during optimization.
off: disables the use of expression statistics for skew detection.

Default Value: on

prefer_hashjoin_path

Definition: Determines whether hash join paths are prioritized during query planning. By preemptively pruning higher-cost alternative paths, it improves overall plan generation efficiency. This parameter is only supported by clusters of version 8.2.1 or later.

Range: Boolean

on: enables early generation of hash join paths.
off: disables this optimization.

Default Value: on

enable_hashfilter_test

Definition: Controls whether a hash filter on distribution columns is added during base table scans to verify result accuracy and whether DN correctness verification is performed during data insertion to ensure data is inserted into the correct node.

Range: Boolean

on: adds hash filters during base table scans and enables DN correctness checks during data insertion.
off: disables both functions.

Default Value: off

This parameter only applies to tables with hash distribution.
Setting this parameter to on may impact data insertion performance due to the additional DN correctness verification step.

enable_topk_optimization

Definition: Controls whether to enable top K sorting optimization. This parameter is only supported by clusters of version 9.1.0.200 or later.

Range: Boolean

on: enables top K sorting optimization.
off: disables top K sorting optimization.

Default Value: on

rewrite_rule

Definition: Identifies the optional query rewriting rules to be enabled. In specific scenarios, this GUC parameter can be used to configure the query rewriting rules to optimize query performance.

This parameter controls the combination of query rewriting rules. For example, with multiple rewriting rules: rule1, rule2, rule3, and rule4, you can set:

set rewrite_rule='on(rule1)';          -- Enable query rewriting rule rule1.
set rewrite_rule='off(rule1)';          -- Disable query rewriting rule rule1.
set rewrite_rule='rule2,rule3';    -- Enable query rewriting rules rule2 and rule3.
set rewrite_rule=none;           -- Disable all optional query rewriting rules.

Range: String

**Table 1** Value descriptions
Value	Description
none	Does not use any optional query rewriting rules.
lazyagg	Uses the Lazy Agg query rewriting rules (removes aggregation operations in subqueries).
magicset	Uses the Magic Set query rewriting rules (pushes conditions from the main query to the subquery).
uniquecheck	Uses the Unique Check rewriting rule (applicable when the target column does not include expression sublinks of aggregate functions. Requires the target column values to be unique after aggregating the sublinks based on related columns. Recommended for professional tuning experts).
disablerep	Uses the rule prohibiting sublink promotion for replicated tables (prohibits sublink promotion for replicated tables).
notinopt	Uses the Not In query rewriting rule (converts Not In to more efficient Hash Join).
or_conversion	Uses OR query rewriting rules (splits or rewrites complex predicates containing OR).
projection_pushdown	Uses Projection Pushdown query rewriting rules.
plain_lazyagg	Uses simplified Lazy Agg query rewriting rules (applied to individual subqueries and distinct operations).
eager_magicset	Uses the Eager Magic Set query rewriting rule (allows pushing down qual conditions in subqueries with group by or partition by).
outer_join_quality_imply	Uses outer join query rewriting rules (enables transferring attributes from outer to inner tables in outer joins).
inlist_merge	Uses the Inlist Merge query rewriting rule (merges multiple IN clauses on the same column into a single IN(List)).
casewhen_simplification	Uses CASE WHEN query rewriting optimizations (performs constant folding on case when...then...else...end constructs, removes unreachable branches, and simplifies calculations).
subquery_qual_pull_up	Uses subquery pull-up query rewriting optimizations (moves filtering conditions from subqueries up to the outer queries).
join_elimination	Uses Join Elimination query rewriting rules (removes joins that do not impact the final results).
not_distinct_from_opt	Uses the Not Distinct From query rewriting rule (transforms "is not distinct from" into = format to support hashjoin).
foreign_agg_opt	Uses partial aggregate function query rewriting optimization rules (retrieves result sets via file footer for count(1), max, and min functions).
selfjoin_elimination	Uses self-join query optimization rewriting rules (reduces self-joins between two tables in EXISTS and ANY sublinks to single-table aggregations).

Default Value: magicset, or_conversion, projection_pushdown, plain_lazyagg, subquery_qual_pull_up, not_distinct_from_opt, and foreign_agg_opt.

cte_options

Definition: Configures prerequisites for a plan to use CTE scans (ctescan).

Range:

**Table 2** Parameters and values
Parameter	Value	Description
pred_threshold	Integer [0, 50]. Default: 2.	Minimum number of predicates required for single-table CTE scan usage.
max_mem	Integer [32, 2097151]. (Do not include units. The unit is MB.) Default: 256.	Maximum estimated memory (in MB) for CTE scan when the reference count is less than ref_count_threshold.
ref_count_threshold	Integer [0, 50]. Default: 4.	Minimum reference count for shared scan if CTE memory exceeds max_mem.

Example:

set cte_options = "(pred_threshold:2),(max_mem:256),(ref_count_threshold:4)";

analyze_mode

Definition: Configures ANALYZE options.

Range: string

none: no special configuration.

auto: enables auto-analyze.
file_sample: enables file-level sampling.

Default Value: none

Parent topic: Query Planning

Previous topic: Query Planning

Next topic: Other Optimizer Options

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot