Updated on 2024-10-25 GMT+08:00

Rewriting of HetuEngine Materialized Views

Enabling Rewriting of Materialized Views

HetuEngine provides the materialized view rewriting capability at the system or session level.

  • Enabling the materialized view rewriting capability at the session level:

    Run the set session materialized_view_rewrite_enabled=true command on the HetuEngine client

  • Enabling the materialized view rewriting capability at the system level:
    1. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI.
    2. Choose Cluster > Services > HetuEngine to go its service page.
    3. In the Basic Information area on the Dashboard page, click the link next to HSConsole WebUI. The HSConsole page is displayed.
    4. Click Compute Instance to view the instance status of the tenant to which operations are to be performed. When the number of green and blue icons is 0, you can perform 5 to enable materialized view rewriting.
    5. In the Compute Instance page, locate the row that contains the tenant to which the target instance belongs and click Configure in the Operation column. On the tab page displayed, add the following custom parameters:
      Table 1 Custom parameters

      Parameter

      Value

      Parameter File

      materialized.view.rewrite.enabled

      true

      coordinator.config.properties

      materialized.view.rewrite.timeout

      5

      coordinator.config.properties

      This step applies to MRS 3.2.0 or later.

      • materialized.view.rewrite.timeout: timeout interval for overwriting a materialized view, in seconds. The recommended value is 5 seconds. Materialized view rewrite takes some time. This parameter can be added to limit the performance loss caused by rewrite. After materialized view rewrite times out, the original SQL statement is executed.
      • To enable the materialized view function at the session level and enable the timeout control for materialized view rewrite, run the set session materialized_view_rewrite_timeout = 5 command first.
    6. Set Start Now to Yes and click OK.

Scope of Materialized View Rewriting

  • Supported materialized view types

    BOOLEAN, DECIMAL, DOUBLE, REAL/FLOAT, INT, BIGINT, SMALLINT, TINYINT, CHAR/VARCHAR, DATE, TIME, TIMESTAMP, INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND, BINARY/VARBINARY, and UUID.

  • Supported functions for materialized view rewriting
    • Conversion function: Only the CAST function is supported.
    • String function: All string functions are supported, including char_length, character_length, chr, codepoint, decode, encode, find_in_set, format_number, locate, hamming_distance, instr, levenshtein, levenshtein_distance, ltrim, lpad, octet_length, position, quote, and repeat2.
    • Mathematical operator: All mathematical operators are supported.
    • Aggregate function: COUNT, SUM, MIN, MAX, AVG, LEAD, LAG, FIRST_VALUE, LAST_VALUE, COVAR_POP, COVAR_SAMP, REGR_SXX, REGR_SYY, STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, ROW_NUMBER, RANK, PERCENT_RANK, DENSE_RANK, and CUME_DIST are supported.

    In the following scenarios, materialized views cannot be used to rewrite SQL queries that contain functions:

    • SQL queries contain parameterless functions.
    • SQL queries contain functions supported by HetuEngine that obtain different types of return values based on parameter types.
    • SQL queries contain nested functions or functions that may throw exceptions causing rewrite failures.
  • The statement for creating materialized views does not support the table name that contains two elements.

    For example, the table names hive.mvschema.t1 and t1 are supported, while the table name mvschema.t1 cannot be used.

Example of Materialized View Rewriting Scenarios

The core principle of materialized view rewriting is that the data of the logically created materialized view must contain the data to be queried in the future query statements or all the data to be included in the subquery in the future query. It is recommended that you enable the automatic creation of materialized views to create materialized views. The following is an example of some scenarios:

In the SQL statement example for creating a materialized view, CREATE MATERIALIZED VIEW xxx WITH(xxx) AS is omitted. For details about the complete statement template, see Table 1.

Table 2 Example of materialized view rewriting scenarios

Scenario

Description

SQL Statement Example for Creating a Materialized View

SQL Statement Example for a User Query

SQL Statement Rewritable

Remarks

Full table query

Basic full table query scenario

select * from tb_a;

select * from tb_a;

No

Creating a materialized view for full table scanning is meaningless and is not supported.

Column query

Basic column query scenario

select col1,col2,col3 from tb_a;

select col1,col2,col3 from tb_a;

Yes

-

User query renaming

select col1 from tb_a;

select col1 as a from tb_a;

Yes

-

select col1,col2,col3 from tb_a;

select col1 as a,col2 as b,col3 as c from tb_a;

Yes

-

Mathematical expression

select col1*col2 from tb_a;

select col2*col1 from tb_a;

Yes

The two columns must have the same type.

Source column used by a materialized view; and cast is used for user query.

select col1,col2 from tb_a;

select cast(col1 as varchar),col2 from tb_a;

No

Original data columns used by a materialized view, which are not rewritten if no filter criteria are configured in the functions used for user query.

Original data columns used by a materialized view, which can be rewritten if the original data columns and filter criteria are used for user query.

case when scenario

select col1, (case col2 when 1 then 'b' when 2 'a' end) as col from tb_a;

select col1, (case col2 when 1 then 'b' when 2 'a' end) as col from tb_a;

No

The case when scenario is not supported in query columns.

String function

select col13 from tb_a;

select length(col13) from tb_a;

No

All string functions use the original table data to create materialized views. The materialized views are not rewritten when queries without filter criteria configured.

select length(col13) from tb_a;

select length(col13) from tb_a;

Yes

-

Aggregate function column query

count

Materialized views and user queries use count.

select count(col1) from tb_a;

select count(col1) from tb_a;

Yes

-

Source data used by a materialized view, and count is used for user queries.

select col1 from tb_a;

select count(col1) from tb_a;

Yes

-

sum

sum is used for materialized views and user queries.

select sum(col1) from tb_a;

select sum(col1) from tb_a;

Yes

-

Source data used by a materialized view, and sum is used for user queries.

select col1 from tb_a;

select sum(col1) from tb_a;

Yes

-

Querying information by specifying filter criteria

(The core is that the data in materialized views is logically the same as or more than that in query SQL statements.)

where filtering

Maximum range of materialized views (<)

select col1 from tb_a;

select col1 from tb_a where col1<11;

Yes

-

The materialized view range is greater than the user query range (<).

select col1 from tb_a where col1<50;

select col1 from tb_a where col1<45;

Yes

-

select col1 from tb_a where col1<50;

select col1 from tb_a where col1<=45;

Yes

-

select col1 from tb_a where col1<50;

select col1 from tb_a where col1 between 21 and 29;

Yes

-

The materialized view range is equal to the user query range (>).

select col1 from tb_a where col1<50;

select col1 from tb_a where col1<50;

Yes

-

The materialized view range is greater than the user query range (and).

select col1 from tb_a where col1<60 and col1>30;

select col1 from tb_a where col1<55 and col1>30;

Yes

-

select col1 from tb_a where col1<60 and col1>30;

select col1 from tb_a where col1 between 35 and 55;

Yes

-

select col1 from tb_a where col1<60 and col1>30;

select col1 from tb_a where (col1<55 and col1>30) and col1 = 56;

Yes

-

where nested subquery

Subquery source table as a materialized view

select col1 from tb_a;

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

Yes

-

Subquery as a materialized view

select min(col1) from tb_a;

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

Yes

-

Parent query source table as a materialized view

select col1 from tb_a where col1=(select min(col1) from tb_a);

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

Yes

-

Parent query as a materialized view

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

Yes

-

limit

limit in a query

select col1 from tb_a;

select col1 from tb_a limit 5;

Yes

-

select col1 from tb_a limit 5;

select col1 from tb_a limit 5;

Yes

-

select col1 from tb_a limit 5;

select col1 from tb_a;

No

-

limit combined with order by

select col1 from tb_a;

select col1 from tb_a order by col1 limit 5;

Yes

If order by is used to create a materialized view, the result may be disordered. If query rewrite for materialized views is enabled, do not use limit or order by in the materialized view creation statement.

select col1 from tb_a order by col1;

select col1 from tb_a order by col1 limit 5;

Yes

select col1 from tb_a order by col1 limit 5;

select col1 from tb_a order by col1 limit 5;

No

having filtering

Maximum range of materialized views (<)

select col1 from tb_a;

select col1 from tb_a group by col1 having col1 <11;

Yes

group by + having: The scenario of having is different from that of where. The having condition cannot be compensated. The materialized view SQL statements must not have the having condition or must be the same as that of user query SQL statements.

The materialized view range is greater than the user query range (<).

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1<45;

No

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1<=45;

No

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1=45;

No

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1 between 21 and 29;

No

The materialized view range is greater than the user query range (<).

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1<50;

Yes

JOIN association query

Two subqueries as a materialized view

select col1,col3 from tb_a where col1<11;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

Yes

-

select cast(col2 as varchar) col2,col3 from tb_b;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

Yes

-

Parent query as a materialized view

with t1 as (select col1,col3 from tb_a),t2 as (select col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

Yes

-

Aggregate + JOIN query

Source table data as a materialized view

select col1,col3 from tb_a;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-

select col2,col3 from tb_b;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-

Subquery as a materialized view

select col1,col3 from tb_a where col1<11;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-

select cast(col2 as varchar) col2,col3 from tb_b;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-

Parent query (whose subqueries use the source table, non-aggregate query) as a materialized view

with t1 as (select col1,col3 from tb_a),t2 as (select col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-

Parent query (non-aggregate query) as a materialized view

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-

Parent query as a materialized view

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

Yes

-