Why Is a Hudi Table Not Displayed on the DLI Console?
Symptom
The Hudi table created using Flink SQL statements is not displayed on the DLI console, making it impossible to manage or perform query operations on the table through the console.
Possible Causes
By default, a Hive catalog is used for data management on the DLI console. However, the metadata of the Hudi table is not correctly synchronized to Hive catalog. The metadata management of Hudi tables is independent of Hive. Therefore, additional configurations are required to ensure that the metadata of Hudi tables can be identified and synchronized by the Hive catalog.
Solution
Hudi table persistence refers to the process of synchronizing metadata and data information of Hudi tables and saving the information to the metadata storage of Hive so that Hive can identify and manage these tables.
- SQL job
If only SQL is used to perform operations on Hudi data, you do not need to configure additional persistence parameters as SQL jobs inherently provide read and write support for Hudi data.
- Flink OpenSource SQL job
To persist the metadata of the Hudi table to the Hive Catalog of DLI, you need to add certain specific configuration parameters when creating the Hudi table. These parameters are used to instruct Hudi to synchronize the metadata to the Hive Catalog.
Table 1 describes the parameters.
For details about how to use Hudi to submit Flink SQL jobs and configure hive_sync parameters on DLI to synchronize metadata to DLI in real time, see Submitting a Flink SQL Job in DLI Using Hudi.
Table 1 Persistence parameters configured when creating a Hudi result table Parameter
Description
hive_sync.enable
Whether to enable synchronization of table information to Hive.
Set this parameter to true when you need to synchronize the metadata of the Hudi table to the Hive metadata storage to allow access to the Hudi table through the Hive query tool or within the data management page of the DLI console.
- true: Hudi will synchronize the table's metadata (for example, table structure, partition information) to Hive.
- false: Table information will not be synchronized to Hive.
Enabling table information synchronization to Hive will use catalog-related permissions. Agency permissions for accessing the catalog must also be configured.
hive_sync.mode
Method for synchronizing metadata from the Hudi table to Hive.
Select an appropriate synchronization method as needed.
- jdbc: Synchronizes metadata by connecting to Hive Server through JDBC.
- hms: Directly interacts with the Hive Metastore through the Hive meta client to synchronize metadata.
- hiveql: Synchronizes metadata by executing Hive QL statements.
hive_sync.table
Name of the table synchronized to Hive. The metadata of the Hudi table will be synchronized to this Hive table.
hive_sync.db
Name of the database synchronized to Hive. The metadata of the Hudi table will be synchronized to the table in the Hive database.
hive_sync.support_timestamp
Whether to support timestamps. This parameter is used to ensure that the timestamp field can be correctly processed when metadata of the Hudi table is synchronized to Hive.
You are advised to set it to true.
- Spark Jar job
When performing operations on Hudi data in Spark Jar, you also need to set some parameters to implement persistence and synchronization to Hive.
Table 2 describes the parameters.
For details about how to use Hudi to submit Spark Jar jobs in DLI and enable job synchronization configuration, see Submitting a Spark Jar Job in DLI Using Hudi.
Table 2 Parameters for synchronizing Spark Jar jobs to Hive tables Parameter
Description
Default Value
hoodie.datasource.hive_sync.enable
Whether to synchronize Hudi tables to Hive. When using the metadata service provided by DLI, configuring this parameter means synchronizing to the metadata of DLI.
You are advised to set it to true to use the metadata service to manage Hudi tables.
true
hoodie.datasource.hive_sync.database
Name of the database to be synchronized to Hive.
default
hoodie.datasource.hive_sync.table
Name of the table to be synchronized to Hive. Set this parameter to the value of hoodie.datasource.write.table.name.
unknown
hoodie.datasource.hive_sync.partition_fields
Hive partition columns.
""
hoodie.datasource.hive_sync.partition_extractor_class
Class used to extract Hudi partition column values and convert them into Hive partition columns.
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
hoodie.datasource.hive_sync.support_timestamp
If the Hudi table contains a field of the timestamp type, set this parameter to true to synchronize the timestamp type to the Hive metadata.
The default value is false, indicating that the timestamp type is converted to bigint during synchronization by default. In this case, an error may occur when you query a Hudi table that contains a field of the timestamp type using SQL statements.
true
hoodie.datasource.hive_sync.fast_sync
Methods for synchronizing Hudi partitions to Hive.
- If this parameter is set to true, add partition if not exist is used to synchronize Hive tables based on the partitions filtered out in the Hudi table.
- If this parameter is set to false, check whether the partitions to be synchronized exist in the Hive table. If not, synchronize the partitions into the Hive table.
true
hoodie.datasource.hive_sync.mode
The method for synchronizing Hudi tables to Hive tables. When using the metadata service provided by DLI, this must be set to hms:
- hms: synchronizes metadata through the Hive metastore client.
- jdbc: synchronizes metadata through the Hive JDBC mode. (Not supported by the DLI metadata service.)
- hiveql: synchronizes metadata by running the hive ql command. (Not supported by the DLI metadata service.)
hms
hoodie.datasource.hive_sync.username
Username specified when synchronizing Hive using JDBC.
hive
hoodie.datasource.hive_sync.password
Password specified when synchronizing Hive using JDBC.
hive
hoodie.datasource.hive_sync.jdbcurl
JDBC URL specified for connecting to Hive.
""
hoodie.datasource.hive_sync.use_jdbc
Whether to use Hive JDBC to synchronize Hudi table information to Hive. You are advised to set this parameter to false. When set to false, the JDBC connection-related configuration will be invalid.
true
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot