Partition Concurrency Control
Each task determines whether a write conflict occurs based on the modified partition information stored in the commit operation in the inflight state. In this way, concurrent write is implemented.
Lock control during concurrency is implemented based on ZooKeeper locking. You do not need to configure additional parameters.
Precautions
Concurrent write control for partitions is implemented based on concurrent write control for a single table. So, the constraints are basically the same as those for the latter.
Currently, data can be concurrently written to partitions only in Spark.
To prevent a large number of concurrent requests from occupying too many ZooKeeper resources, a quota limit function is added to Hudi on ZooKeeper. You can modify the zk.quota.number parameter of Spark on the server to adjust the quota of Hudi. The default value is 500000, and the minimum value is 5. This parameter cannot be used to control the number of concurrent tasks. It is used only to control the access pressure on ZooKeeper.
Using Partition Concurrency
Set hoodie.support.partition.lock to true to enable concurrent partition write.
Example:
Enable concurrent partition write in Spark datasource mode:
upsert_data.write.format("hudi"). option("hoodie.datasource.write.table.type", "COPY_ON_WRITE"). option("hoodie.datasource.write.precombine.field", "col2"). option("hoodie.datasource.write.recordkey.field", "primary_key"). option("hoodie.datasource.write.partitionpath.field", "col0"). option("hoodie.upsert.shuffle.parallelism", 4). option("hoodie.datasource.write.hive_style_partitioning", "true"). option("hoodie.support.partition.lock", "true"). option("hoodie.table.name", "tb_test_cow"). mode("Append").save(s"/tmp/huditest/tb_test_cow")
Enable concurrent partition write in Spark SQL mode:
set hoodie.support.partition.lock=true; insert into hudi_table1 select 1,1,1;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot