Updated on 2024-11-29 GMT+08:00

CLUSTERING

Function

Clusters Hudi tables. For details, see Clustering.

Syntax

  • Performing clustering:

    call run_clustering(table=>'[table]', path=>'[path]', predicate=>'[predicate]', order=>'[order]');

  • Viewing the clustering plan:

    call show_clustering(table=>'[table]', path=>'[path]', limit=>'[limit]');

Parameter Description

Table 1 Parameters

Parameter

Description

Mandatory

table

Name of the table to be queried. The value can be in the database.tablename format.

No

path

Path of the table to be queried

No

predicate

Predicate sentence to be defined

No

order

Sorting field for clustering

No

limit

Number of query results to display

No

Example

call show_clustering(table => 'hudi_table1');

call run_clustering(table => 'hudi_table1', predicate => '(ts >= 1006L and ts < 1008L) or ts >= 1009L', order => 'ts');

call run_clustering(path => '/user/hive/warehouse/hudi_test2', predicate => "dt = '2021-08-28'", order => 'id');

Precautions

  • Either table or path must exist. Otherwise, the Hudi table to be clustered cannot be determined.
  • To cluster a specified partition, refer to the format predicate => "dt = '2021-08-28'".

System Response

You can view query results on the client.