Failed to Delete a Table Due to Excessive Hive Partitions
Symptom
Partitioning a Hive table by two columns may eventually generate over 20,000 partition files. As a result, the user fails to execute the truncate table ${TableName} or drop table ${TableName} statement to delete table data.
Cause Analysis
The file deletion operations are executed by a single thread serially. If the Hive partitioned tables have too many partition files, a large amount of metadata is stored in the metadata database. It takes a long time to delete metadata when a statement is executed to delete table data. As a result, the deletion cannot be complete within the specified timeout period, and the operation fails.
You can log in to FusionInsight Manager and choose Cluster > Services > Hive. On the Hive page, choose Configuration > All Configurations, choose ServerInit under MetaStore(Role) in the navigation tree, and view the hive.metastore.client.socket.timeout parameter value in the right pane. This value is the timeout period. You can view the default value in the Description column.
Procedure
- (Optional, perform this step for an internal table) Use alter table ${TableName} set TBLPROPERTIES('EXTERNAL'='true') to convert it into an external table. In this way, only its metadata but not data stored on the HDFS is deleted, saving the table deletion time.
- (Optional, perform this step to use the same table name) Run the show create table ${TableName} command to export the table structure, and then run the ALTER TABLE ${TableName} RENAME TO ${new_table_name}; command to rename the table. In this way, you can create a table that is the same as the original one.
- Run the hdfs dfs -rm -r -f ${hdfs_path} command to delete table data from HDFS.
- Use alter table ${Table_Name} drop partition (${PartitionName}<'XXXX', ${PartitionName}>'XXXX'); in Hive to delete partitions and reduce the number of files. The deletion conditions can be flexibly configured.
- When the number of rest partitions is smaller than 1,000, run the drop table ${TableName} command to delete the table.
Summary and Suggestions
Hive partitioning can improve query efficiency. However, you should properly plan the partitioning policies to prevent a large number of small files from being generated.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot