使用DWS Turbo引擎提升数据查询性能
场景介绍
Turbo向量化引擎是一种高性能的数据处理引擎,它采用向量化技术来显著提升数据处理的效率和速度,它是基于传统列存执行引擎的主要性能瓶颈点做出的独创性突破,通过数据格式优化、自研Hash算法、根据运行态数据特征定制数据处理算法、算子算法重构等手段实现性能倍增目标。
Turbo引擎相比原列存执行引擎,对字符串、numeric类型做了内存和磁盘存储格式优化,且对常用sort/agg/join/scan等算子做了极致性能优化,使得执行器整体性能提升1倍左右,可显著降低业务计算费用。
在数据全内存缓存场景下,常用BenchMark测试结果如下:
- TPC-H性能提升50%到1.5X。
- TPC-DS性能提升50%到80%。
- SSB性能提升70%。
- Sort全排性能提升90%。
约束限制
- 当用户业务的作业查询不涉及merge join或sort agg算子时,执行器可走Turbo执行引擎,请在业务规划时确认。
- Turbo引擎使用要求基表为Turbo表,即建表时设置enable_turbo_store=on,且GUC参数turbo_engine_version设置为3。如需关闭Turbo引擎,可设置turbo_engine_version为0。
- 仅9.1.0.210及以上集群版本支持。
- 3.0普通列存表不支持设置Turbo存储格式,需同时设置成hstore_opt表方可支持Turbo存储格式,即创建3.0列存表,需要同时指定enable_hstore_opt=on方可打开Turbo。
使用Turbo表
- 使用客户端连接DWS集群。
- 执行以下命令,确保GUC参数turbo_engine_version为3。
1 2
SHOW turbo_engine_version; SET turbo_engine_version = 3;
- 执行以下SQL,创建一个名为src的临时表,仅包含一行数据,为后续的INSERT操作提供数据源。
1
CREATE TABLE src AS SELECT 1;
- 执行以下SQL,分别创建两张表,其中一张禁用Turbo,命名为non_turbo_table;另一张启用Turbo,命名为turbo_table。
1 2
CREATE TABLE non_turbo_table(a int, b numeric(15,2)) WITH(orientation=column,enable_turbo_store=off); CREATE TABLE turbo_table(a int, b numeric(15,2)) WITH(orientation=column,enable_turbo_store=on);
- 执行以下SQL,分别为两张表插入2000万行数据。执行时间约半分钟,请等待。
1 2
INSERT INTO non_turbo_table SELECT generate_series(1,20000000) % 1000,generate_series(1,20000000) FROM src; INSERT INTO turbo_table SELECT generate_series(1,20000000) % 1000,generate_series(1,20000000) FROM src;
- 执行以下SQL,分别查询两张表的查询性能。
1 2
EXPLAIN PERFORMANCE SELECT sum(b) FROM non_turbo_table GROUP BY a; EXPLAIN PERFORMANCE SELECT sum(b) FROM turbo_table GROUP BY a;
禁用Turbo引擎的表回显如下(篇幅有限,仅显示Query Plan总览和Query Summary),查询总用时为196.335 ms。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
QUERY PLAN id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs ----+----------------------------------------------------------+--------------------+----------+----------+------------+----------------+----------+---------+---------+---------- 1 | -> Row Adapter | 193.352 | 1000 | 1000 | | 23KB | | | 42 | 26197.71 2 | -> Vector Streaming (type: GATHER) | 193.283 | 1000 | 1000 | | 167KB | | | 42 | 26197.71 3 | -> Vector Sonic Hash Aggregate | [181.540, 182.619] | 1000 | 1000 | | [1MB, 1MB] | 16MB | [21,21] | 42 | 26161.18 4 | -> Vector Streaming(type: REDISTRIBUTE) | [180.863, 182.166] | 3000 | 6000 | | [183KB, 215KB] | 2MB | | 42 | 26154.51 5 | -> Vector Sonic Hash Aggregate | [88.320, 117.714] | 3000 | 6000 | | [1MB, 1MB] | 16MB | [21,21] | 42 | 26114.00 6 | -> CStore Scan on public.non_turbo_table | [32.022, 41.146] | 20000000 | 20000000 | | [869KB, 869KB] | 1MB | | 10 | 9437.33 ... ====== Query Summary ===== ------------------------------------------------------------------------------- Datanode executor start time [dn_6003_6004, dn_6009_6010]: [1.949 ms,2.231 ms] Datanode executor run time [dn_6005_6006, dn_6007_6008]: [181.600 ms,182.674 ms] Datanode executor end time [dn_6003_6004, dn_6009_6010]: [0.041 ms,0.056 ms] Remote query poll time: 186.041 ms, Deserialze time: 0.036 ms System available mem: 2129920KB Query Max mem: 2129920KB Query estimated mem: 4063KB Enqueue time: 0.015 ms Coordinator executor start time: 0.551 ms Coordinator executor run time: 193.649 ms Coordinator executor end time: 0.071 ms Total network : 59kB Parser runtime: 0.000 ms Planner runtime: 1.945 ms Query Id: 145241088537559390 Unique SQL Id: 1249234219 Unique SQL Hash: sql_41862c1cdf0f08d7e33d51f39bfda62d Total runtime: 196.335 ms
启用Turbo引擎的表回显如下(篇幅有限,仅显示Query Plan总览和Query Summary),查询总用时为51.101 ms。
由此可知,启用Turbo引擎后,查询总体耗时大大缩减了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
QUERY PLAN id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs ----+------------------------------------------------------+------------------+----------+----------+------------+----------------+----------+---------+---------+---------- 1 | -> Row Adapter | 49.316 | 1000 | 1000 | | 23KB | | | 42 | 27429.71 2 | -> Vector Streaming (type: GATHER) | 49.253 | 1000 | 1000 | | 167KB | | | 42 | 27429.71 3 | -> Vector Sonic Hash Aggregate | [37.402, 38.366] | 1000 | 1000 | | [942KB, 942KB] | 16MB | [30,30] | 42 | 27393.18 4 | -> Vector Streaming(type: REDISTRIBUTE) | [37.111, 38.200] | 3000 | 6000 | | [183KB, 183KB] | 2MB | | 42 | 27386.51 5 | -> Vector Sonic Hash Aggregate | [29.227, 31.946] | 3000 | 6000 | | [942KB, 942KB] | 16MB | [30,30] | 42 | 27346.00 6 | -> CStore Scan on public.turbo_table | [10.392, 11.590] | 20000000 | 20000000 | | [917KB, 917KB] | 1MB | | 10 | 10669.33 ... ====== Query Summary ===== ------------------------------------------------------------------------------ Datanode executor start time [dn_6005_6006, dn_6001_6002]: [1.056 ms,1.803 ms] Datanode executor run time [dn_6001_6002, dn_6009_6010]: [37.429 ms,38.416 ms] Datanode executor end time [dn_6009_6010, dn_6005_6006]: [0.037 ms,0.110 ms] Remote query poll time: 41.001 ms, Deserialze time: 0.013 ms System available mem: 2129920KB Query Max mem: 2129920KB Query estimated mem: 4063KB Turbo Engine: true Enqueue time: 0.016 ms Coordinator executor start time: 0.451 ms Coordinator executor run time: 49.599 ms Coordinator executor end time: 0.105 ms Total network : 33kB Parser runtime: 0.000 ms Planner runtime: 0.859 ms Query Id: 145241088537562603 Unique SQL Id: 1906062289 Unique SQL Hash: sql_be1160458499bd4c66933be3ecd942cc Total runtime: 51.101 ms