Updated on 2023-06-19 GMT+08:00

Common Configuration Items of Batch SQL Jobs

This section describes the common configuration items of the SQL syntax for DLI batch jobs.

Table 1 Common configuration items

Item

Default Value

Description

spark.sql.files.maxRecordsPerFile

0

Maximum number of records to be written into a single file. If the value is zero or negative, there is no limit.

spark.sql.autoBroadcastJoinThreshold

209715200

Maximum size of the table that displays all working nodes when a connection is executed. You can set this parameter to -1 to disable the display.

NOTE:

Currently, only the configuration unit metastore table that runs the ANALYZE TABLE COMPUTE statistics noscan command and the file-based data source table that directly calculates statistics based on data files are supported.

spark.sql.shuffle.partitions

200

Default number of partitions used to filter data for join or aggregation.

spark.sql.dynamicPartitionOverwrite.enabled

false

Whether DLI overwrites the partitions where data will be written into during runtime. If you set this parameter to false, all partitions that meet the specified condition will be deleted before data overwrite starts. For example, if you set false and use INSERT OVERWRITE to write partition 2021-02 to a partitioned table that has the 2021-01 partition, this partition will be deleted.

If you set this parameter to true, DLI does not delete partitions before overwrite starts.

spark.sql.files.maxPartitionBytes

134217728

Maximum number of bytes to be packed into a single partition when a file is read.

spark.sql.badRecordsPath

-

Path of bad records.