Updated on 2024-05-29 GMT+08:00

CREATE MATERIALIZED VIEW

Syntax

CREATE [ OR REPLACE ] MATERIALIZED VIEW [ IF NOT EXISTS ] view_name [ COMMENT string ] [ WITH properties ] AS query

Description

This statement is used to create a materialized view based on the SELECT query result. A materialized view is a database object that contains the results of a query. For example, it can be a local copy of remote data, a row or column, or a subset of rows and columns in the query results of a single table or multiple joined tables, or a summary table that uses aggregate functions.

The materialized view is usually created using precomputed aggregation and join results of a query. Materialized views support the query rewriting feature, which transforms a query statement based on a base table to an equivalent based on one or more materialized views.

The syntax supports the following attributes:

  • storage_table: specifies the name of a storage table.
  • need_auto_refresh: During compute instance management, after creating a maintenance instance, you can set need_auto_refresh to true to create a materialized view that can be automatically refreshed as well as create and submit an automatic refresh task for the created materialized view. On this basis, you can set refresh_duration, start_refresh_ahead_of_expiry, refresh_priority, and other properties to adjust the automatic refresh task.
  • mv _validity: specifies the life cycle of a materialized view. 0 indicates that the materialized view is permanently valid. The minimum value is 1 minute. If need_auto_refresh is set to false, mv _validity is set to 0 by default. If need_auto_refresh is set to true, mv _validity is set to 24 hours by default.
  • refresh_duration: specifies the maximum waiting duration of the automatic materialized view refresh task. The default value is 5 minutes. The value ranges from 1 minute to 24 hours. If the waiting time of the automatic refresh task exceeds the maximum waiting time, the task status is displayed as timeout on the automatic task page.
  • start_refresh_ahead_of_expiry: specifies the submission time of automatic materialized refreshing based on mv _validity. This attribute indicates that automatic refreshing tasks are submitted when the specified percentage of the materialized view life cycle is reached. The default value is 0.2, and the minimum value is 0.05.
  • refresh_priority: priority for automatic refreshing tasks of the materialized views. The default and maximum value is 3. 1 indicates the highest priority. Tasks with higher priorities are more likely to be executed first.

Example

  • Create the same schema in mv catalog and the catalog for data storage (the following example uses Hive as the catalog for data storage), and enable query rewrite for the materialized view.
    hetuengine:tpcds_2gb> set session materialized_view_rewrite_enabled=true;
    hetuengine:tpcds_2gb> create schema mv.tpcds;
    CREATE SCHEMA
    hetuengine:tpcds_2gb> create schema hive.tpcds;
    CREATE SCHEMA
  • Create a table.
    hetuengine:tpcds_2gb> create table t1 (id int, c1 varchar); 
    hetuengine:tpcds_2gb> Insert into t1 values (1,'abc'), (2,'abc2'), (3,'abc3'), (4,'abc4'), (5,'abc5'),(6, 'abc6');
    hetuengine:tpcds_2gb> create table tb_a(a int ,b varchar, c varchar);
    hetuengine:tpcds_2gb> create table tb_b(a int ,d varchar, e varchar);
  • Create a materialized view named mv.tpcds.test in tpcds schema of mv catalog. If a materialized view with the same name already exists, an error occurs.
    hetuengine:tpcds_2gb> create materialized view mv.tpcds.test as select c1 from t1 where id <7;
    CREATE MATERIALIZED VIEW
  • Create a materialized view named mv.tpcds.test based on a specified column name in mv catalog and tpcds schema.
    hetuengine:tpcds_2gb> create materialized view mv.tpcds.test (a ,b) as select c1, id from t1 where id<7;
    CREATE MATERIALIZED VIEW
  • Create a materialized view using if not exists in mv catalog and tpcds schema. If a materialized view with the same name already exists, no error occurs.
    hetuengine:tpcds_2gb> create materialized view if not exists mv.tpcds.test as select c1, id from t1 where id<7;
    CREATE MATERIALIZED VIEW
  • Create a materialized view with specified properties in mv catalog and tpcds schema.
    hetuengine:tpcds_2gb> create materialized view mv.tpcds.test with (storage_table='mppdb.tpcds.test2',need_auto_refresh = true, mv_validity = '10m', start_refresh_ahead_of_expiry = 0.2, refresh_priority = 1, refresh_duration = '5m') as select c1, id from t1 where id<7;
    CREATE MATERIALIZED VIEW
  • Create a materialized view with comments.
    hetuengine:tpcds_2gb> create materialized view mv.tpcds.test comment 'test_comment' as select c1, id from t1 where id<7;
    CREATE MATERIALIZED VIEW

Precautions

  • To create a materialized view, ensure that mv catalog exists.
  • After creating a materialized view, you need to run the refresh materialized view xxx command to populate the materialized view with data.
  • The materialized view rewriting function needs to be enabled at the system or session level.
  • Schema used to create a view in mv catalog must be created in advance in the catalog and mv catalog that are used for data storage.
  • Do not delete the data table of a materialized view in the catalog.
  • Do not create a materialized view using the query statement that contains Order By.
  • When creating an MV, ensure that the query does not contain subqueries or subquery joins. Otherwise, use WITH subqueries instead.

    Example:

    hetuengine:tpcds_2gb> create materialized view mv.tpcds.test1 as select t1.a, b, d from ((select a, b, c from tb_a) as t1 join (select a, d, e from tb_b) as t2 on t1.a=t2.a);

    Use WITH to replace subqueries and subquery joins:

    hetuengine:tpcds_2gb> 
    create materialized view mv.tpcds.test1 as
    with t1 as (select a, b, c from tb_a),
    t2 as (select a, d, e from tb_b)select t1.a, b, d from t1 join t2 on t1.a = t2.a;
  • Querying the rewrite of some materialized views is not supported. If the data to be queried is part of data (subset data) of the view, the query statement cannot be rewritten to an equivalent one for querying the materialized view.

    For example, if you use the select id from test where id <100 statement to create a materialized view named t1 and then perform a query the using the select id from test where id <50 statement, the query statement cannot be rewritten because it attempts to use some data of the materialized view.

  • When a materialized view is created, the table name must be a fully qualified name (catalogName.schemaName.tableName) or a table name.

    Example:

    hetuengine:tpcds_2gb> create materialized view mv.tpcds.test as select c1 from t1 where id <7;

    Table name t1 can be replaced with fully qualified name hive.tpcds_2gb.t1 but cannot be replaced with tpcds_2gb.t1.

  • Query rewrite for materialized views does not support full table scan. SQL queries do not use the Where clause and cannot be rewritten by queries.

    For example, if the column definition of the hivetb1 table contains the id, name, and age columns, the following SQL query cannot be rewritten:

    Create MV SQL : select id,name,age from hivetb1;