Optimizing INSERT Statements
Scenario
You can add custom configurations based on the number of partition columns in the query result to achieve optimal writing performance when using HetuEngine to write data to a Hive data source partition table.
Procedure
- Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine. The HetuEngine service page is displayed.
- In the Basic Information area on the Dashboard page, click the link next to HSConsole WebUI.
- On HSConsole, click Data Source. Locate the row that contains the target Hive data source, click Edit in the Operation column, and add custom configurations. You can adjust custom parameters by referring to Table 1.
Table 1 Performance optimization parameters of the INSERT statement Parameter
Description
hive.max-partitions-per-writers
The value must be greater than or equal to the product of the values generated by the Count(distinct) method for all partition columns of the Hive data source partition table to which data is to be written.
task.writer-count
1
The following is an example of the Count(distinct) method:
The t2 table contains the col1, col2, and col3 columns. The query result is as follows:
col1 col2 col3
A 100 5
C 103 4
B 101 3
E 110 4
D 100 5
- If col3 is a partition column and its Count(distinct) value is 3, you are advised to set hive.max-partitions-per-writers to a value no less than 3.
- If the result table has multiple partition columns, for example, col2 and col3, and the Count(distinct) values of col2 and col3 are 4 and 3, respectively, you are advised to set hive.max-partitions-per-writers to a value no less than 12.
- Click OK.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot