Updated on 2025-05-29 GMT+08:00

Materialize

Description

The Materialize operator is used to cache the results returned by subnodes and save the subquery results. For subnodes that need to be scanned repeatedly for multiple times (especially when the scanning results are the same each time), the execution cost can be reduced.

Typical Scenarios

  • If a query statement involves subqueries and the same batch of data needs to be queried for multiple times, the optimizer selects the Materialize operator to cache the subquery results, greatly reducing the scan execution time. Subqueries often exist in clauses such as IN, ANY, ALL, and EXISTS.
  • Nested loop is used as the join operator.

Examples

Perform subqueries with ALL.

-- Prepare data.
gaussdb=#DROP TABLE IF EXISTS student;
gaussdb=#CREATE TABLE student(id integer, class_id integer, grade number); 
CREATE TABLE 
gaussdb=#CREATE TABLE class_table(class_id integer); 
CREATE TABLE 
gaussdb=#INSERT INTO student VALUES(generate_series(1,50), 1, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO student VALUES(generate_series(51,100), 2, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO student VALUES(generate_series(101,150), 3, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO student VALUES(generate_series(151,200), 3, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO class_table VALUES(1),(2),(3),(4); 
INSERT 0 4

-- Execution result.
gaussdb=#EXPLAIN SELECT * FROM student WHERE class_id >= all (SELECT class_id FROM class_table); 
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=0.38..100.67 rows=10 width=40)
   Node/s: All datanodes
   ->  Seq Scan on student  (cost=0.00..100.11 rows=10 width=40)
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Materialize  (cost=0.00..13.28 rows=80 width=4)
                 ->  Streaming(type: BROADCAST)  (cost=0.00..13.18 rows=40 width=4)
                       Spawn on: All datanodes
                       ->  Seq Scan on class_table  (cost=0.00..13.13 rows=20 width=4)
(9 rows)

-- Drop.
gaussdb=#DROP TABLE IF EXISTS student,class_table;

In the preceding example, the output of the Materialize operator is as follows.

Item

Description

Materialize

Operator name.

Streaming

Operator name.

Spawn on: datanode1

In the example, the data is distributed to all DNs.

Example: Nested loop is used as the join operator.

-- Prepare data.
gaussdb=# CREATE TABLE t1(a int,b int);
CREATE TABLE 
gaussdb=#  create index key_a on t1(a);
CREATE INDEX
gaussdb=# CREATE TABLE t2(a int,b int); 
CREATE TABLE 

-- Collect statistics.
gaussdb=# ANALYZE t1;
gaussdb=# ANALYZE t2;

-- Execution result.
gaussdb=# explain select t1.*,t2.* from t1 inner join t2 on t1.a < t2.b;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..38.43 rows=133 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=0.00..32.24 rows=133 width=16)
         Join Filter: (t1.a < t2.b)
         ->  Streaming(type: BROADCAST)  (cost=0.00..15.18 rows=40 width=8)
               Spawn on: All datanodes
               ->  Seq Scan on t1  (cost=0.00..13.13 rows=20 width=8)
         ->  Materialize  (cost=0.00..13.20 rows=20 width=8)
               ->  Seq Scan on t2  (cost=0.00..13.13 rows=20 width=8)
(9 rows)

-- Drop.
gaussdb=# DROP TABLE t1,t2;

In the preceding example, the output of the Materialize operator is as follows.

Item

Description

Materialize

Operator name.

Seq Scan

Operator name.