Updated on 2025-02-22 GMT+08:00

CLUSTERING

Function

Performs the clustering operation on Hudi tables. For details, see Hudi Clustering.

Syntax

  • Performing clustering:

    call run_clustering(table=>'[table]', path=>'[path]', predicate=>'[predicate]', order=>'[order]');

  • Viewing the clustering plan:

    call show_clustering(table=>'[table]', path=>'[path]', limit=>[limit]);

Parameter Description

Table 1 Parameter descriptions

Parameter

Description

Mandatory

table

Name of the table to be queried. The value can be in the database.tablename format.

Either table or path must be set.

path

Path of the table to be queried

Either table or path must be set.

predicate

Predicate to be defined, which is used to filter partitions to be clustered

No

order

Sorting field for clustering

No

limit

Number of query results to display

No

Example

call show_clustering(table => 'hudi_table1');

call run_clustering(table => 'hudi_table1', predicate => '(ts >= 1006L and ts < 1008L) or ts >= 1009L', order => 'ts');

call run_clustering(path => 'obs://bucket/path/hudi_test2', predicate => "dt = '2021-08-28'", order => 'id');

Caveats

  • Either table or path must exist. Otherwise, the Hudi table to be clustered cannot be determined.
  • When using the metadata service provided by DLI, this command only supports configuring the table parameter and does not support configuring the path parameter.
  • To cluster a specified partition, refer to the format predicate => "dt = '2021-08-28'".

System Response

You can check if the job status is successful, view the job result, and review the job logs to confirm if there are any exceptions.