Updated on 2025-01-09 GMT+08:00

OPTIMIZE

Function

This command optimizes the data layout in storage to improve query speed.

Precautions

  • Since optimization is time-consuming, you need to determine the frequency of running OPTIMIZE based on the trade-off between better end-user query performance and the optimization computation time.
  • To optimize partitioned tables, set spark.sql.forcePartitionPredicatesOnPartitionedTable.enabled to false.

Syntax

OPTIMIZE [database_name.]table_name

[ WHERE boolExpression]

Z-ordering:

OPTIMIZE [database_name.]table_name

[ WHERE boolExpression]

ZORDER BY (columnList);

Parameter Description

Table 1 Parameter descriptions

Parameter

Description

database_name

Name of the database, consisting of letters, numbers, and underscores (_)

table_name

Name of the table in the database, consisting of letters, numbers, and underscores (_)

boolExpression

Filter condition expression

columnList

List of fields specified for z-ordering, and the Z-order columns should be different from the partition columns.

Required Permissions

  • SQL permissions
Table 2 Permissions required for executing OPTIMIZE

Permission Description

UPDATE permission on a table

  • Fine-grained permission: dli:table:update
  • Metadata services provided by LakeFormation. Refer to the LakeFormation documentation for details on permission configuration.

Examples

OPTIMIZE delta_table0;

optimize delta_table0 where dt >= '2020-01-01';

OPTIMIZE delta_table0 WHERE dt >= current_timestamp() - INTERVAL 1 day ZORDER BY (price);

System Response

Display the success or failure of the command execution in the driver logs and the client.