更新时间:2024-10-21 GMT+08:00
分享

配置HetuEngine物化视图改写能力

开启物化视图改写能力

HetuEngine支持在System级别或者Session级别开启物化视图改写能力,开启方法如下所示:

  • Session级别:

    参考快速使用HetuEngine访问Hive数据源HetuEngine客户端执行set session materialized_view_rewrite_enabled=true

  • System级别:
    1. 使用用于访问HetuEngine WebUI界面的用户登录FusionInsight Manager。
    2. 选择“集群 > 服务 > HetuEngine”,进入HetuEngine服务页面。
    3. 在概览页签下的“基本信息”区域单击“HSConsole WebUI”后的链接,进入HSConsole界面。
    4. 单击“计算实例”,查看待操作的租户的实例状态,当绿色图标和蓝色图标数量均为“0”时,可执行5配置开启物化视图改写能力。
    5. 在“计算实例”页签,在待操作的实例所属租户所在行的“操作”列单击“配置”,进入“配置实例”页签,添加如下自定义参数。
      表1 自定义参数

      名称

      参数文件

      materialized.view.rewrite.enabled

      true

      coordinator.config.properties

      materialized.view.rewrite.timeout

      5

      coordinator.config.properties

      适用于MRS 3.2.0及之后版本。

      • materialized.view.rewrite.timeout:物化视图的重写超时控制(单位:秒),推荐5s。物化视图重写时会消耗一定的时间,添加该参数可限制重写所带来的性能损耗,物化视图重写超时后会执行原始SQL。
      • 若使用Session级别开启物化视图功能,并需要开启物化视图重写超时控制,可先执行set session materialized_view_rewrite_timeout = 5
    6. 参数添加完成后, 将“立即启动”置为“是”,单击“确定”。

物化视图改写能力支持范围

  • 物化视图支持的类型

    BOOLEAN、DECIMAL、DOUBLE、REAL/FLOAT、INT、BIGINT、SMALLINT、TINYINT、CHAR/VARCHAR、DATE、TIME、TIMESTAMP、INTERVAL YEAR TO MONTH、INTERVAL DAY TO SECOND、BINARY/VARBINARY、UUID。

  • 物化视图改写支持的函数
    • 转换函数:只支持CAST函数。
    • 字符串函数:支持所有字符串函数,包括char_length、character_length、chr、codepoint、decode、encode、find_in_set、format_number、locate、hamming_distance、instr、levenshtein、levenshtein_distance、ltrim、lpad、octet_length、position、quote、repeat2。
    • 数学运算符:支持所有数学运算符。
    • 聚合函数:支持的聚合函数包括 COUNT、SUM、MIN、MAX、AVG、LEAD、LAG、FIRST_VALUE、LAST_VALUE、COVAR_POP、COVAR_SAMP、REGR_SXX、REGR_SYY、STDDEV_POP、STDDEV_SAMP、VAR_POP、VAR_SAMP、ROW_NUMBER、RANK、PERCENT_RANK、DENSE_RANK、CUME_DIST。

    以下场景,物化视图不支持对包含了函数的SQL查询进行改写:

    • SQL中包含无参函数
    • SQL中包含了HetuEngine支持的会根据参数的类型获得不同类型的返回值的函数
    • SQL中函数存在嵌套使用,或者是使用的函数会发生异常导致重写失败的函数
  • 物化视图创建语句改写不支持二段式表名,支持一段式和三段式表名。

    如支持改写表名为“hive.mvschema.t1”和“t1”,不支持改写表名为“mvschema.t1”。

物化视图改写场景示例

物化视图的改写的核心原理是逻辑上创建的物化视图的数据要包含未来的查询语句要查询的数据,也可以是未来查询中的子查询要包含的全部数据。建议用户打开自动创建物化视图功能针对性的创建物化视图,以下为部分场景示例:

创建物化视图SQL样例中省略“CREATE MATERIALIZED VIEW xxx WITH(xxx) AS”,完整语句模板可参考表1

表2 物化视图改写场景示例

场景

描述

创建物化视图SQL样例

用户查询SQL样例

查询SQL是否能被改写

备注

全表查询

最基本的全表查询场景

select * from tb_a;

select * from tb_a;

创建全表扫描的物化视图没有实际意义,不支持

列查询

最基本的列查询场景

select col1,col2,col3 from tb_a;

select col1,col2,col3 from tb_a;

-

用户查询重命名

select col1 from tb_a;

select col1 as a from tb_a;

-

select col1,col2,col3 from tb_a;

select col1 as a,col2 as b,col3 as c from tb_a;

-

数学表达式

select col1*col2 from tb_a;

select col2*col1 from tb_a;

需要两个列的类型一样

物化视图使用源列,用户查询使用cast

select col1,col2 from tb_a;

select cast(col1 as varchar),col2 from tb_a;

物化视图使用原数据列,用户查询使用函数没有过滤条件不改写

物化视图使用原数据列,用户查询也使用原数据列加过滤条件可改写

case when场景

select col1, (case col2 when 1 then 'b' when 2 'a' end) as col from tb_a;

select col1, (case col2 when 1 then 'b' when 2 'a' end) as col from tb_a;

不支持查询列中出现case when场景

字符串函数

select col13 from tb_a;

select length(col13) from tb_a;

所有的字符串函数用原表数据建立物化视图不加过滤条件的查询做物化视图不会改写

select length(col13) from tb_a;

select length(col13) from tb_a;

-

聚合函数列查询

count

物化视图和用户查询一样使用count

select count(col1) from tb_a;

select count(col1) from tb_a;

-

物化视图使用源数据,用户查询使用count

select col1 from tb_a;

select count(col1) from tb_a;

-

sum

物化视图和用户查询一样使用sum

select sum(col1) from tb_a;

select sum(col1) from tb_a;

-

物化视图使用源数据,用户查询使用sum

select col1 from tb_a;

select sum(col1) from tb_a;

-

过滤查询

(核心在于物化视图的数据逻辑上要比查询SQL相同或者更多)

where过滤

物化视图最大范围(<)

select col1 from tb_a;

select col1 from tb_a where col1<11;

-

物化视图范围大于用户查询范围(<)

select col1 from tb_a where col1<50;

select col1 from tb_a where col1<45;

-

select col1 from tb_a where col1<50;

select col1 from tb_a where col1<=45;

-

select col1 from tb_a where col1<50;

select col1 from tb_a where col1 between 21 and 29;

-

物化视图范围等于用户查询范围(>)

select col1 from tb_a where col1<50;

select col1 from tb_a where col1<50;

-

物化视图范围大于用户查询范围(and)

select col1 from tb_a where col1<60 and col1>30;

select col1 from tb_a where col1<55 and col1>30;

-

select col1 from tb_a where col1<60 and col1>30;

select col1 from tb_a where col1 between 35 and 55;

-

select col1 from tb_a where col1<60 and col1>30;

select col1 from tb_a where (col1<55 and col1>30) and col1 = 56;

-

where嵌套子查询

子查询源表做物化视图

select col1 from tb_a;

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

-

子查询做物化视图

select min(col1) from tb_a;

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

-

父查询源表做物化视图

select col1 from tb_a where col1=(select min(col1) from tb_a);

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

-

父查询做物化视图

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

select count(col1) from tb_a where col1=(select min(col1) from tb_a);

-

limit

limit在查询里

select col1 from tb_a;

select col1 from tb_a limit 5;

-

select col1 from tb_a limit 5;

select col1 from tb_a limit 5;

-

select col1 from tb_a limit 5;

select col1 from tb_a;

-

limit结合order by

select col1 from tb_a;

select col1 from tb_a order by col1 limit 5;

创建物化视图时使用order by可能导致改写后结果无序,如果开启物化视图改写功能,查询SQL中有order by和limit,建议创建物化视图SQL去掉limit和order by

select col1 from tb_a order by col1;

select col1 from tb_a order by col1 limit 5;

select col1 from tb_a order by col1 limit 5;

select col1 from tb_a order by col1 limit 5;

having过滤

物化视图最大范围(<)

select col1 from tb_a;

select col1 from tb_a group by col1 having col1 <11;

group by + having:having的场景和where不一样,having的条件无法做到补偿,要求物化视图SQL的没有having条件或者与用户查询SQL的having条件一致

物化视图范围大于用户查询范围(<)

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1<45;

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1<=45;

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1=45;

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1 between 21 and 29;

物化视图范围等于用户查询范围(<)

select col1 from tb_a group by col1 having col1<50;

select col1 from tb_a group by col1 having col1<50;

JOIN关联查询

两个子查询做物化视图

select col1,col3 from tb_a where col1<11;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

-

select cast(col2 as varchar) col2,col3 from tb_b;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

-

父查询做物化视图

with t1 as (select col1,col3 from tb_a),t2 as (select col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

-

聚合+JOIN查询

源表数据做物化视图

select col1,col3 from tb_a;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

select col2,col3 from tb_b;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

子查询做物化视图

select col1,col3 from tb_a where col1<11;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

select cast(col2 as varchar) col2,col3 from tb_b;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

子查询使用源表的父查询(非聚合查询)做物化视图

with t1 as (select col1,col3 from tb_a),t2 as (select col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

父查询(非聚合查询)做物化视图

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

父查询做物化视图

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3;

-

相关文档