Updated on 2024-04-24 GMT+08:00

What's New in Spark 3.3.1

DLI complies with the release consistency of the open source Spark compute engine. This section describes the updates in Spark 3.3.1.

For more information about Spark 3.3.1, see Spark Release Notes.

Spark 3.3.1 Release Date

Version

Release Date

Status

EOM Date

EOS Date

DLI Spark 3.3.1

June 2023

Released

June 30, 2025

June 30, 2026

For more version support information, see Lifecycle of DLI Compute Engine Versions.

Spark 3.3.1 Description

The following lists the main features of Spark 3.3.1.

For more information on new features and performance optimizations, see Release Notes - Spark 3.3.1.

Table 1 Advantages of Spark 3.3.1

Feature

Description

Native performance acceleration

Improved the performance of Spark query statements.

Metadata access performance improvement

Improved Spark's metadata access performance for handling big data and enhanced data processing efficiency.

Improving the performance of OBS Committer when writing small files

Improved the performance of Object Storage Service (OBS) when writing small files, improving data transfer efficiency.

Dynamic executor shuffle data optimization

Improved the stability of resource scaling and cleaned up Executors when shuffle files are no longer needed.

Merging small files

If a large number of small files are generated during SQL execution, job execution and table query will take a long time. In this case, you are advised to merge small files.

Merge small files by referring to How Do I Merge Small Files?

Modifying column comments of non-partitioned or partitioned tables

You can modify the column comments of non-partitioned or partitioned tables.

Collecting statistics on the CPU usage of SQL jobs

You can view the total CPU used on the console.

Viewing Spark logs of container clusters

You need to view logs in the container.

Dynamic UDF loading (OBT)

The UDF takes effect without restarting the queue.

Supporting flame graphs on the Spark UI

Flame graphs can be created on the Spark UI.

Optimizing the query performance of the NOT IN statement for SQL jobs

The query performance of the NOT IN statement is improved.

Optimizing the query performance of the Multi-INSERT statement

The query performance of the Multi-INSERT statement is improved.