Updated on 2025-12-19 GMT+08:00

OPTIMIZE

Function

Rewrites specified data in a target table.

Currently, this syntax is only supported for Iceberg tables.

Precautions

  • If a sort key is specified, data within files will be ordered after rewriting, but global ordering is not guaranteed.
  • When defining rewrite conditions, you are advised to use partition-based criteria for efficiency.

Syntax

OPTIMIZE table_name REWRITE DATA 
    [ WITH OPTIONS (option_key = option_value [, ...]) ]
    [ WHERE condition ]
    [ ORDER BY target_list [ ASC | DESC ] ]

Parameter Description

  • table_name: name of the target table. Range: an existing table name.
  • condition: an expression that returns a boolean value, used to determine which files need to be rewritten.
  • target_list: sort keys for data rewriting.
  • option_key, option_value: configuration options for data rewriting.
Table 1 option_key, option_value descriptions

Parameter

Value

Description

Default Value

target-file-size-bytes

0–Max(uint64)

Size of rewritten files.

Matches write.target-file-size-bytes.

min-file-size-bytes

0–Max(uint64)

Original files smaller than this threshold are targeted.

75% of target-file-size-bytes

max-file-size-bytes

0–Max(uint64)

Original files larger than this threshold are targeted.

180% of target-file-size-bytes

min-input-files

0–Max(uint32)

Rewriting occurs if a partition has more target files than this value.

5

delete-file-threshold

0–Max(uint32)

Rewriting occurs if a partition has more delete files than this value.

INT_MAX

rewrite-all

true, false

Ignores all other conditions and rewrites all matching files.

false

max-file-group-size-bytes

0–Max(uint64)

Splits target files into multiple tasks if their total size exceeds this.

100 GB

Examples

Create the reason_t2 table.
CREATE TABLE reason_t2
(
    TABLE_SK          INTEGER               ,
    TABLE_ID          VARCHAR(20)           ,
    TABLE_NA          VARCHAR(20)
) PARTITION BY (TABLE_SC VARCHAR(20)) STORE AS iceberg;

INSERT INTO reason_t2 VALUES (1, 'S01', 'StudentA', 'Xiangbei'),(2, 'T01', 'TeacherA', 'Lingnan'),(3, 'T02', 'TeacherB', 'Hainan');
INSERT INTO reason_t2 VALUES (4, 'S02', 'StudentX', 'Xiangbei'),(5, 'T03', 'TeacherX', 'Lingnan'),(6, 'T04', 'TeacherY', 'Hainan');
Rewrite data in the TABLE_SC = 'Hainan' partition.
OPTIMIZE reason_t2 REWRITE DATA WITH OPTIONS ('rewrite-all' = 'true') WHERE TABLE_SC = 'Hainan';

Run the ORDER BY statement to view the data rewriting plan of the TABLE_SC = 'Lingnan' partition.

EXPLAIN VERBOSE OPTIMIZE reason_t2 REWRITE DATA WITH OPTIONS ('rewrite-all' = 'true') WHERE TABLE_SC = 'Lingnan' ORDER BY 1;

                                                                         QUERY EXEC INFO                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  id |                                operation                                | E-rows | E-distinct | E-width | E-costs 
 ----+-------------------------------------------------------------------------+--------+------------+---------+---------
   1 | ->  Row Adapter                                                         |   1000 |            |     178 | 73.68   
   2 |    ->  Vector Insert on reason_t2                                       |   1000 |            |     178 | 73.68   
   3 |       ->  Vector Sort                                                   |   1000 |            |     178 | 53.68   
   4 |          ->  Partitioned Vector Foreign Scan on reason_t2               |   1000 |            |     178 | 1.35    
                                                          Predicate Information (identified by plan id)                                                         
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
   4 --Partitioned Vector Foreign Scan on reason_t2
         Server Type: lf
         Pruning results: (Manifests total: 3, Manifests left: 2, Files total: 4, Files dynamic prune 0, Files dynamic RF prune: 0, Files left after pruning: 2)
         DN read from: direct
         replication: 2 files 0.00 MB

Configure min-file-size-bytes to rewrite all target files.

EXPLAIN VERBOSE OPTIMIZE reason_t2 REWRITE DATA WITH OPTIONS ('min-file-size-bytes' = '0')  ORDER BY 1;