Updated on 2024-11-30 GMT+08:00

Custom Parameters

The following table lists the custom parameters supported when you create a DLI verification task.

Table 1 Custom parameters supported for DLI verification tasks

Parameter

Default Value

Description

mgc.mc2dli.table.partition.enable

true

Indicates whether to query the DLI metadata when a DLI table partition is empty or missing.

  • If this parameter is set to true, the verification status of any empty DLI table partitions will be succeeded, and that of any missing partitions will be failed.
  • If this parameter is set to false, the verification status of any empty or missing DLI table partitions will be succeeded.

spark.sql.files.maxRecordsPerFile

0

The maximum number of records to be written into a single file. If the value is zero or negative, there is no limit.

spark.sql.autoBroadcastJoinThreshold

209715200

The maximum size of the table that displays all working nodes when a connection is executed. You can set this parameter to -1 to disable the display.

NOTE:

Currently, only the configuration unit metastore table that runs the ANALYZE TABLE COMPUTE statistics noscan command and the file-based data source table that directly calculates statistics based on data files are supported.

spark.sql.shuffle.partitions

200

The default number of partitions used to filter data for join or aggregation.

spark.sql.dynamicPartitionOverwrite.enabled

false

Whether DLI overwrites the partitions where data will be written into during runtime. If you set this parameter to false, all partitions that meet the specified condition will be deleted before data overwrite starts. For example, if you set this parameter to false and use INSERT OVERWRITE to write partition 2021-02 to a partitioned table that has the 2021-01 partition, this partition will be deleted.

If you set this parameter to true, DLI does not delete partitions before overwrite starts.

spark.sql.files.maxPartitionBytes

134217728

The maximum number of bytes to be packed into a single partition when a file is read.

spark.sql.badRecordsPath

-

The path of bad records.

spark.sql.legacy.correlated.scalar.query.enabled

false

  • If set to true:
    • When there is no duplicate data in a subquery, executing a correlated subquery does not require deduplication from the subquery's result.
    • If there is duplicate data in a subquery, executing a correlated subquery will result in an error. To resolve this, the subquery's result must be deduplicated using functions such as max() or min().
  • If set to false:

    Regardless of whether there is duplicate data in a subquery, executing a correlated subquery requires deduplicating the subquery's result using functions such as max() or min(). Otherwise, an error will occur.