Diese Seite ist in Ihrer lokalen Sprache noch nicht verfügbar. Wir arbeiten daran, weitere Sprachversionen hinzuzufügen. Vielen Dank für Ihre Unterstützung.

On this page

Show all

Reading the Hudi COW Table View

Updated on 2024-12-11 GMT+08:00
  • Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table stored in Hive and use ${table_name} to specify the table name.
    select count(*) from ${table_name};
  • Reading the real-time view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.

    The query type QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_SNAPSHOT_OPT_VAL. Use ${table_name} to specify the table name.

    spark.read.format("hudi")
    .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_SNAPSHOT_OPT_VAL) // Set the query type to the real-time view.
    .load("/tmp/default/cow_bugx/") // Specify the path of the Hudi table to read.
    .createTempView("mycall")
    spark.sql("select * from mycall").show(100)
  • Reading the incremental view (using Hive as an example and ${table_name} to specify the table name.)
    set hoodie.${table_name}.consume.mode=INCREMENTAL;  //Set incremental read.
    set hoodie.${table_name}.consume.max.commits=3;  // Specify the maximum number of commits to be consumed.
    set hoodie.${table_name}.consume.start.timestamp=20201227153030;  // Specify the initial commit to pull incremental views.
    select count(*) from default.${table_name} where `_hoodie_commit_time`>'20201227153030'; // This filtering condition must be added, and the value is the initial commit to pull incremental views.
  • Reading the incremental view (using SparkSQL as an example and ${table_name} to specify the table name.)
    set hoodie.${table_name}.consume.mode=INCREMENTAL;  //Set incremental read.
    set hoodie.${table_name}.consume.start.timestamp=20201227153030;  // Specify the initial commit to pull incremental views.
    set hoodie.${table_name}.consume.end.timestamp=20210308212318;  // Specify the end commit to pull incremental views. If this parameter is not specified, the latest commit is used.
    select count(*) from default.${table_name} where `_hoodie_commit_time`>'20201227153030'; // This filtering condition must be added, and the value is the initial commit to pull incremental views.
  • Reading the incremental view (using the Spark DataSource API as an example):

    QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_INCREMENTAL_OPT_VAL.

    spark.read.format("hudi")  
    .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL) // Set the query type to the incremental mode.
    .option(BEGIN_INSTANTTIME_OPT_KEY, "20210308212004")  // Specify the initial incremental pull commit.
    .option(END_INSTANTTIME_OPT_KEY, "20210308212318")  //: Specify the end commit of the incremental pull.
    .load("/tmp/default/cow_bugx/")  // Specify the path of the Hudi table to read.
    .createTempView("mycall")  // Register as a Spark temporary table.
    spark.sql("select * from mycall where `_hoodie_commit_time`>'20210308211131'")// Start the query. The statement is the same as the Hive incremental query statement.
    .show(100, false)
  • Reading the read-optimized view: The read-optimized view of COW tables is equivalent to the real-time view.
Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback