Optimizer Method Configuration
The following configuration parameters provide a basic way to influence the query optimizer's choice of execution plans. If the default plan chosen by the optimizer for a specific query is not optimal, these configuration parameters can be used to force the optimizer to select a different plan as a temporary solution. Better methods include tuning the optimizer cost constants, manually running ANALYZE, and increasing the value of the default_statistics_target parameter.
enable_bitmapscan
Definition: Controls whether the optimizer uses bitmap scan planning.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_hashagg
Definition: Determines if the optimizer uses hash aggregation planning.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_mixedagg
Definition: Controls whether the optimizer uses mixed aggregation planning. (This parameter is only supported by clusters of version 8.2.0 or later.)
Range: Boolean
- on indicates that a Mixed Agg query plan is generated for the Grouping Sets statement (including Rollup or Cube) that meets certain conditions.
- off: disabled.
Default Value: on
In large-scale scenarios (for example, single DN table data exceeding 100 GB), using Mixed Agg can enhance query execution performance.
However, it is not supported under the following conditions:
- The GROUP BY clause contains columns with unsupported hashable data types.
- Aggregate functions include DISTINCT or ORDER BY operations.
- The GROUPING SETS clause does not contain empty groupings.
enable_hashjoin
Definition: Controls whether the query optimizer uses the Bitmap Scan plan type.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_indexscan
Definition: Controls whether the optimizer uses index scan plans.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_indexonlyscan
Definition: Determines if the optimizer uses index-only scan plans.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_material
Definition: Controls the optimizer's use of materialization. While complete elimination of materialization is impossible, disabling this parameter prevents the optimizer from inserting materialization nodes.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_mergejoin
Definition: Controls whether the optimizer uses merge join plans.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: off
enable_nestloop
Definition: Controls whether the query optimizer uses the nested-loop join plan type to fully scan inner tables. It is impossible to eliminate nested loop joins. However, disabling this variable will make the optimizer prefer other methods when they exist.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: off
left_join_estimation_enhancement
Definition: Controls whether optimized row count estimates are used for left joins. This parameter is only supported by clusters of version 8.3.0.100 or later.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: off
enable_seqscan
Definition: Controls whether the optimizer uses sequential scan plans. While it is not possible to eliminate sequential scans completely, disabling this parameter prompts the optimizer to favor other approaches when applicable.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_sort
Definition: Controls the optimizer's sorting steps. Explicit sorts cannot be fully removed, but disabling this parameter allows the optimizer to prefer alternatives when they exist.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
max_opt_sort_rows
Definition: Controls the maximum number of rows (limit + offset) in ORDER BY clauses that can be optimized. This parameter is only supported by clusters of version 8.3.0 or later.
Range: an integer ranging from 0 to INT_MAX
- A value of 0 disables this parameter.
- Other values enable optimization when limit + offset is below the threshold. Otherwise, optimization is disabled. Optimization reduces execution time but may increase memory usage.
Default Value: 0
enable_stream_concurrent_update
Definition: Controls whether the optimizer generates stream plans for concurrent updates, contingent on the enable_stream_operator parameter.
Range: Boolean
- on: allows the optimizer to generate stream plans for UPDATE statements.
- off: allows the optimizer to generate only non-stream plans for UPDATE statements.
Default Value: on
enable_ctescan
Definition: Controls whether plans support common table expression (CTE) scans.
Range: Boolean
- on: enables CTE scan support.
- off: disables CTE scan support.
Default Value: on
enable_stream_operator
Definition: Controls the optimizer's use of stream operators.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_stream_recursive
Definition: Determines if WITH RECURSIVE queries are pushed down for distributed execution on DNs.
Range: Boolean
- on: enables pushdown for WITH RECURSIVE queries.
- off: disables pushdown for WITH RECURSIVE queries.
Default Value: on
enable_value_redistribute
Definition: Controls whether value redistribution optimization is applied to RANK, DENSE_RANK, and ROW_NUMBER functions without PARTITION BY clauses in cluster versions 8.2.0 or later.
Range: Boolean
- on: enables value redistribution optimization.
- off: disables value redistribution optimization.
Default Value: off
max_recursive_times
Definition: Sets the maximum iteration count for WITH RECURSIVE queries.
Range: an integer ranging from 0 to INT_MAX
Default Value: 200
enable_vector_engine
Definition: Controls the optimizer's use of vectorized execution engines.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_broadcast
Definition: Controls whether the optimizer's consideration of broadcast distribution during stream cost estimation.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_redistribute
Definition: Determines if the optimizer's evaluation of local and split redistribute distributions during stream cost estimation. This parameter is only supported by clusters of version 8.2.1.300 or later.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: on
enable_change_hjcost
Definition: Adjusts Hash Join cost estimation by excluding inner table runtime costs from the Hash Join node's runtime cost calculation. This favors selecting smaller tables with higher runtime costs as inner tables.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: off
enable_fstream
Definition: Controls the optimizer's use of stream operators when delivering statements. This parameter is exclusively applicable to HDFS foreign tables.
This parameter has now been deprecated. You can set this parameter to on for forward compatibility, but the setting will have no effect.
Range: Boolean
- on: enabled.
- off: disabled.
Default Value: off
enable_hashfilter
Definition: Controls whether hash filters can be generated for plans involving replicated tables (including dual and constant tables). This parameter is only supported by clusters of version 8.2.0 or later.
Range: Boolean
- on: allows hash filter generation.
- off: prohibits hash filter generation under all circumstances.
Default Value: on
best_agg_plan
- hashagg+gather(redistribute)+hashagg
- redistribute+hashagg(+gather)
- hashagg+redistribute+hashagg(+gather)
This parameter controls which hashagg plan the optimizer generates.
Range: 0, 1, 2, and 3.
- The value 1 forces the generation of the first plan.
- The value 2 forces the generation of the second plan if the group by columns can be redistributed. Otherwise, it generates the first plan.
- The value 3 forces the generation of the third plan if the group by columns can be redistributed. Otherwise, it generates the first plan.
- The value 0 allows the optimizer to select the best plan based on the estimated costs of the above three plans.
Default Value: 0
turbo_engine_version
Definition: For tables created with the turbo storage format (with the enable_turbo_store parameter set to on in the table attributes), and when the query does not involve merge join or sort agg operators, the executor can utilize the turbo execution engine, significantly improving performance.
Range: 0, 1, 2, and 3.
- The value 0 disables the turbo execution engine.
- The value 1 enables the turbo execution engine only for single-table aggregate queries.
- The value 2 enables the turbo execution engine only for single-table aggregate or multi-table join queries.
- The value 3 accelerates most common operators using the turbo execution engine, excluding operators like merge join and sort agg. When the data volume is large and turbo_engine_version is set to 3, the occurrence of merge join and sort agg operators is relatively rare, so turbo execution engine acceleration can be achieved for almost SQL statements.
Default Value: 3
You are advised not to enable the turbo execution engine in cross-VW scenarios.
spill_compression
Definition: Controls the compression algorithm used when executor operators spill data to disk due to insufficient memory. This parameter is only supported by clusters of version 9.1.0.100 or later.
Range: enumerated values
- 'lz4' indicates that the lz4 compression algorithm is used, which provides better performance for scenarios with smaller spill volumes, but requires more storage space.
- 'zstd' indicates that the zstd compression algorithm is used, which provides better performance for scenarios with larger spill volumes where I/O is the main bottleneck, and requires approximately 2/3 of the storage space used by lz4.
Default Value: 'lz4'
index_selectivity_cost
Definition: Controls the cost calculation for cbtree during columnar table index scans (when selectivity > 0.001). This parameter is only supported by clusters of version 8.2.1.1008.2.1.1 or later.
Range: a floating point number, which can be –1 or ranges from 0 to 1000.
- A value of 0 ignores the index selectivity threshold of 0.001.
- A value of -1 applies disable_cost.
- Any other value serves as a coefficient for cbtree cost calculations.
Default Value: –1
index_cost_limit
Definition: Sets the threshold at which cbtree cost calculations become inactive during columnar table index scans. This parameter is only supported by clusters of version 8.2.1.1008.2.1.1 or later.
Range: an integer ranging from 0 to 2147483647
- A value of 0 disables this parameter.
- Any other value ensures that tables with fewer rows than this limit ignore the index selectivity threshold of 0.001.
Default Value: 0
volatile_shipping_version
Definition: Controls the scope of volatile function pushdown execution.
Range: 0, 1, 2, and 3.
- Value 3 extends support from value 2 by enabling pushdown for InlineCTEs referenced once and UPSERT operations on replicated tables involving volatile functions.
- Value 2 builds on value 1 by allowing pushdown for target columns containing volatile functions in replicated CTE results.
- Value 1 enables full pushdown for nextval, uuid_generate_v1, sys_guid, and uuid functions appearing in statement target columns.
- Value 0 supports full pushdown for random-like functions and partial pushdown for nextval and uuid_generate_v1 in INSERT statements with simple queries.
Default Value: 3
agg_redistribute_enhancement
Definition: During the Agg operation, if multiple group by columns are contained and none of them is a distribution column, a group by column is selected for redistribution. This parameter controls the policy for selecting a redistribution column.
Range: Boolean
- on indicates that the redistribution column with the largest estimated distinct value is selected as the redistribution column.
- off indicates that the first column that can be redistributed is selected as the redistribution column.
Default Value: off
enable_valuepartition_pruning
Definition: Determines whether static or dynamic optimization is applied to DFS partitioned tables.
Range: Boolean
- on: enables static/dynamic optimization for DFS partitioned tables.
- off: disables static/dynamic optimization for DFS partitioned tables.
Default Value: on
stream_multiple
Definition: Sets a weighting factor used by the optimizer when calculating the cost of stream operators.
The final stream cost is multiplied by this weight based on the existing cost model.
Range: a floating point number ranging from 0 to 10000
Default Value: 1
This parameter only applies to Redistribute and Broadcast types of stream operators.
qrw_inlist2join_optmode
Definition: Controls whether inlist-to-join query rewriting is enabled.
Range: String
- disable: disables inlist-to-join query rewriting.
- cost_base: enables cost-based inlist-to-join query rewriting.
- rule_base: forces rule-based inlist-to-join query rewriting.
- Any positive integer: Sets a threshold for inlist-to-join query rewriting; if the number of elements in the list exceeds this value, the rewriting is performed.
Default Value: disable
enable_inlist_hashing
Definition: Controls whether inlist hashing optimization is enabled. This parameter is only supported by clusters of version 9.1.0 or later.
Range: Boolean
- on: enables inlist hashing optimization.
- off: disables inlist hashing optimization.
Default Value: on
setop_optmode
Definition: Controls whether deduplication is performed for individual query branches in set operations (UNION/EXCEPT/INTERSECT) without the ALL option. This parameter is only supported by clusters of version 8.2.0 or later.
Range: enumerated values
- disable: Query branches do not perform deduplication.
- force: Forces query branches to perform deduplication.
- cost: The optimizer selects the execution method with the lower cost between deduplication and non-deduplication.
Default Value: cost
The parameter configurations take effect only when the SQL execution plan meets these conditions:
- The UNION/EXCEPT/INTERSECT operation in the SQL statement does not include the ALL option.
- Each query branch involved in the set operation performs data redistribution before executing the set operation.
skew_option
Definition: Controls whether to use the optimization policy.
Range: String
- off: disables the policy.
- normal: uses the aggressive policy. For scenarios where skew occurrence is uncertain, assumes skew exists and performs corresponding optimizations.
- lazy: uses the conservative policy. For scenarios where skew occurrence is uncertain, assumes no skew exists and performs no optimizations.
Default Value: normal
enable_expr_skew_optimization
Definition: Controls whether expression statistics are used in skew optimization strategies. This parameter is only supported by clusters of version 9.1.0.100 or later.
Range: Boolean
- on: enables the use of expression statistics to detect skew during optimization.
- off: disables the use of expression statistics for skew detection.
Default Value: on
prefer_hashjoin_path
Definition: Determines whether hash join paths are prioritized during query planning. By preemptively pruning higher-cost alternative paths, it improves overall plan generation efficiency. This parameter is only supported by clusters of version 8.2.1 or later.
Range: Boolean
- on: enables early generation of hash join paths.
- off: disables this optimization.
Default Value: on
enable_hashfilter_test
Definition: Controls whether a hash filter on distribution columns is added during base table scans to verify result accuracy and whether DN correctness verification is performed during data insertion to ensure data is inserted into the correct node.
Range: Boolean
- on: adds hash filters during base table scans and enables DN correctness checks during data insertion.
- off: disables both functions.
Default Value: off
- This parameter only applies to tables with hash distribution.
- Setting this parameter to on may impact data insertion performance due to the additional DN correctness verification step.
enable_topk_optimization
Definition: Controls whether to enable top K sorting optimization. This parameter is only supported by clusters of version 9.1.0.200 or later.
Range: Boolean
- on: enables top K sorting optimization.
- off: disables top K sorting optimization.
Default Value: on
rewrite_rule
Definition: Identifies the optional query rewriting rules to be enabled. In specific scenarios, this GUC parameter can be used to configure the query rewriting rules to optimize query performance.
This parameter controls the combination of query rewriting rules. For example, with multiple rewriting rules: rule1, rule2, rule3, and rule4, you can set:
set rewrite_rule='on(rule1)'; -- Enable query rewriting rule rule1. set rewrite_rule='off(rule1)'; -- Disable query rewriting rule rule1. set rewrite_rule='rule2,rule3'; -- Enable query rewriting rules rule2 and rule3. set rewrite_rule=none; -- Disable all optional query rewriting rules.
Range: String
|
Value |
Description |
|---|---|
|
none |
Does not use any optional query rewriting rules. |
|
lazyagg |
Uses the Lazy Agg query rewriting rules (removes aggregation operations in subqueries). |
|
magicset |
Uses the Magic Set query rewriting rules (pushes conditions from the main query to the subquery). |
|
uniquecheck |
Uses the Unique Check rewriting rule (applicable when the target column does not include expression sublinks of aggregate functions. Requires the target column values to be unique after aggregating the sublinks based on related columns. Recommended for professional tuning experts). |
|
disablerep |
Uses the rule prohibiting sublink promotion for replicated tables (prohibits sublink promotion for replicated tables). |
|
notinopt |
Uses the Not In query rewriting rule (converts Not In to more efficient Hash Join). |
|
or_conversion |
Uses OR query rewriting rules (splits or rewrites complex predicates containing OR). |
|
projection_pushdown |
Uses Projection Pushdown query rewriting rules. |
|
plain_lazyagg |
Uses simplified Lazy Agg query rewriting rules (applied to individual subqueries and distinct operations). |
|
eager_magicset |
Uses the Eager Magic Set query rewriting rule (allows pushing down qual conditions in subqueries with group by or partition by). |
|
outer_join_quality_imply |
Uses outer join query rewriting rules (enables transferring attributes from outer to inner tables in outer joins). |
|
inlist_merge |
Uses the Inlist Merge query rewriting rule (merges multiple IN clauses on the same column into a single IN(List)). |
|
casewhen_simplification |
Uses CASE WHEN query rewriting optimizations (performs constant folding on case when...then...else...end constructs, removes unreachable branches, and simplifies calculations). |
|
subquery_qual_pull_up |
Uses subquery pull-up query rewriting optimizations (moves filtering conditions from subqueries up to the outer queries). |
|
join_elimination |
Uses Join Elimination query rewriting rules (removes joins that do not impact the final results). |
|
not_distinct_from_opt |
Uses the Not Distinct From query rewriting rule (transforms "is not distinct from" into = format to support hashjoin). |
|
foreign_agg_opt |
Uses partial aggregate function query rewriting optimization rules (retrieves result sets via file footer for count(1), max, and min functions). |
|
selfjoin_elimination |
Uses self-join query optimization rewriting rules (reduces self-joins between two tables in EXISTS and ANY sublinks to single-table aggregations). |
Default Value: magicset, or_conversion, projection_pushdown, plain_lazyagg, subquery_qual_pull_up, not_distinct_from_opt, and foreign_agg_opt.
cte_options
Definition: Configures prerequisites for a plan to use CTE scans (ctescan).
Range:
|
Parameter |
Value |
Description |
|---|---|---|
|
pred_threshold |
Integer [0, 50]. Default: 2. |
Minimum number of predicates required for single-table CTE scan usage. |
|
max_mem |
Integer [32, 2097151]. (Do not include units. The unit is MB.) Default: 256. |
Maximum estimated memory (in MB) for CTE scan when the reference count is less than ref_count_threshold. |
|
ref_count_threshold |
Integer [0, 50]. Default: 4. |
Minimum reference count for shared scan if CTE memory exceeds max_mem. |
Example:
set cte_options = "(pred_threshold:2),(max_mem:256),(ref_count_threshold:4)";
analyze_mode
Definition: Configures ANALYZE options.
Range: string
- none: no special configuration.
- auto: enables auto-analyze.
- file_sample: enables file-level sampling.
Default Value: none
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot