Updated on 2023-04-28 GMT+08:00

Optimizing INSERT Statements

Scenario

You can add custom configurations based on the number of partition columns in the query result to achieve optimal writing performance when using HetuEngine to write data to a Hive data source partition table.

Procedure

  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine. The HetuEngine service page is displayed.
  2. In the Basic Information area on the Dashboard page, click the link next to HSConsole WebUI.
  3. On HSConsole, click Data Source. Locate the row that contains the target Hive data source, click Edit in the Operation column, and add custom configurations. You can adjust custom parameters by referring to Table 1.

    Table 1 Performance optimization parameters of the INSERT statement

    Parameter

    Description

    hive.max-partitions-per-writers

    The value must be greater than or equal to the product of the values generated by the Count(distinct) method for all partition columns of the Hive data source partition table to which data is to be written.

    task.writer-count

    1

    The following is an example of the Count(distinct) method:

    The t2 table contains the col1, col2, and col3 columns. The query result is as follows:

    col1 col2 col3

    A 100 5

    C 103 4

    B 101 3

    E 110 4

    D 100 5

    • If col3 is a partition column and its Count(distinct) value is 3, you are advised to set hive.max-partitions-per-writers to a value no less than 3.
    • If the result table has multiple partition columns, for example, col2 and col3, and the Count(distinct) values of col2 and col3 are 4 and 3, respectively, you are advised to set hive.max-partitions-per-writers to a value no less than 12.

  4. Click OK.