Updated on 2024-11-29 GMT+08:00

Deleting Historical Data

Scenario

Delete old data from Hudi tables to reduce space occupation and save storage costs.

Running the delete/drop partition Statement

The delete/drop paritition command can be used to delete historical data. For details, see Hudi SQL Syntax Reference.

Advantages: The operation is simple and COW and MOR tables are supported.

Disadvantages: The concurrency is low. When Hudi tables are in the real-time write state, concurrent execution of the delete/drop partition command may cause the real-time data import job to fail.

Running the call clean_data Command

  • Function

    The call clean_data is used to delete historical data from MOR tables.

    Advantages: The deletion operation can be executed concurrently with the data import task, which does not affect the real-time import of data.

    Disadvantages: Only MOR tables are supported, and lazy deletion depends on compaction.

  • Syntax

    call clean_data(table => 'table_name', sql => 'delete statement')

  • Parameters
    Table 1 Parameters

    Parameter

    Description

    table_name

    Name of the table whose data is to be deleted. The value can be in the database.tablename format.

    delete statement

    SQL statement of the select type, which is used to find the data to be deleted.

  • Example

    Delete all data whose primaryKey is smaller than 100 from the mytable table:

    call clean_data(table => 'mytable', sql=>'select *  from mytable where primaryKey < 100') 

    Clear the residual files of the clean_data command. If the clean_data command fails to execute, temporary files are generated. This command can be used to clear these temporary files.

    call clean_data(table => 'mytable', sql=>'delete cleanData') 
  • System response

    You can view query results on the client.