Updated on 2024-11-29 GMT+08:00

Adjusting Metadata Cache

Scenario

When HetuEngine accesses the Hive data source, it needs to access the Hive metastore to obtain the metadata information. HetuEngine provides the metadata cache function. When the database or table of the Hive data source is accessed for the first time, the metadata information (database name, table name, table field, partition information, and permission information) of the database or table is cached, the Hive metastore does not need to be accessed again during subsequent access. If the table data of the Hive data source does not change frequently, the query performance can be improved to some extent.

Procedure

  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine.
  2. On the Dashboard tab page that is displayed, find the Basic Information area, and click the link next to HSConsole WebUI.
  3. On HSConsole, click Data Source. Locate the row that contains the target Hive data source, click Edit in the Operation column, and add custom configurations according to Table 1.

    Table 1 Metadata cache parameters

    Parameter

    Description

    Default Value

    hive.metastore-cache-ttl

    Cache duration of the metadata of the co-deployed Hive data source.

    0s

    hive.metastore-cache-maximum-size

    Maximum cache size of the metadata of the co-deployed Hive data source.

    10000

    hive.metastore-refresh-interval

    Interval for refreshing the metadata of the co-deployed Hive data source.

    1s

    hive.per-transaction-metastore-cache-maximum-size

    Maximum cache size of the metadata for each transaction of the co-deployed Hive data source.

    1000

  4. Click OK.