Nested Loop Join

算子说明

嵌套循环连接（Nested Loop Join）是最简单的连接方法，也是所有关系数据库系统中都会实现的连接操作。这种方法的基本思想是“把两个表中的数据两两比较，看是否满足连接条件”。

在GaussDB中，Nested Loop Join的工作原理是，对于外部表（Outer Table）中的每一行，扫描内部表（Inner Table），查找符合连接条件的行。这类似于两个嵌套的循环，外部循环遍历外部表，内部循环遍历内部表，因此得名。

Nested Loop Join的时间复杂度是O(n*m), 其中n和m分别代表两个表的行数，如果内部表可以用索引来扫描，那么时间复杂度可以降低到O(nlogm)。

典型场景

当内部表或外部表（或者两者都）非常小的时候。
当内部表能够根据连接条件快速定位到满足条件的行时，例如内部表的连接字段已经建了索引。
Nested Loop Join对连接的条件没有限制，任何连接条件Nested Loop Join都可以执行。

示例

示例：两个小表之间做Join。

--数据准备。 
gaussdb=# CREATE TABLE employee(id int, deptid int); 
CREATE TABLE 
gaussdb=# INSERT INTO employee VALUES(1, 1), (2,1),(3,2),(4, 1), (5,2); 
INSERT 0 5 
gaussdb=# CREATE TABLE manager(id int, deptid int); 
CREATE TABLE 
gaussdb=# INSERT INTO manager VALUES(1,1), (2,2),(3,1),(4,2); 
INSERT 0 4

--收集统计信息。
gaussdb=# ANALYZE employee;
gaussdb=# ANALYZE manager;

--执行结果。 
gaussdb=# EXPLAIN SELECT * FROM employee e JOIN manager m ON e.deptid < m.deptid; 
                                  QUERY PLAN
------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..38.43 rows=133 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=0.00..32.24 rows=133 width=16)
         Join Filter: (e.deptid < m.deptid)
         ->  Streaming(type: BROADCAST)  (cost=0.00..15.18 rows=40 width=8)
               Spawn on: All datanodes
               ->  Seq Scan on employee e  (cost=0.00..13.13 rows=20 width=8)
         ->  Materialize  (cost=0.00..13.20 rows=20 width=8)
               ->  Seq Scan on manager m  (cost=0.00..13.13 rows=20 width=8)
(9 rows)

--删除。
gaussdb=# DROP TABLE employee,manager;

上述示例中，Nested Loop Join输出信息如下所示。

信息名称	含义
Nested Loop	算子的名称。
Join Filter	算子join的连接谓词，示例中条件为e.deptid 小于m.deptid，在查询执行时，满足这些条件的行会被包含在最终的结果集中。

示例：内部表的连接字段已经建了索引

--数据准备。 
gaussdb=# CREATE TABLE t1(a int,b int);
CREATE TABLE 
gaussdb=#  create index key_a on t1(a);
CREATE INDEX
gaussdb=# CREATE TABLE t2(a int,b int); 
CREATE TABLE 

--收集统计信息。
gaussdb=# ANALYZE t1;
gaussdb=# ANALYZE t2;

--执行结果。 
gaussdb=# explain select t1.*,t2.* from t1 inner join t2 on t1.a < t2.b;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..38.43 rows=133 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=0.00..32.24 rows=133 width=16)
         Join Filter: (t1.a < t2.b)
         ->  Streaming(type: BROADCAST)  (cost=0.00..15.18 rows=40 width=8)
               Spawn on: All datanodes
               ->  Seq Scan on t1  (cost=0.00..13.13 rows=20 width=8)
         ->  Materialize  (cost=0.00..13.20 rows=20 width=8)
               ->  Seq Scan on t2  (cost=0.00..13.13 rows=20 width=8)
(9 rows)

--删除。
gaussdb=# DROP TABLE t1,t2;

示例：任何连接条件

--数据准备。 
gaussdb=# CREATE TABLE t1(a int,b int);
CREATE TABLE 
gaussdb=#  create index key_a on t1(a);
CREATE INDEX
gaussdb=# CREATE TABLE t2(a int,b int); 
CREATE TABLE 

--收集统计信息。
gaussdb=# ANALYZE t1;
gaussdb=# ANALYZE t2;

--执行结果。 
gaussdb=# explain select t1.*,t2.* from t1 inner join t2 on t1.a != t2.b and t1.a < t2.a;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..38.89 rows=127 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=0.00..32.89 rows=127 width=16)
         Join Filter: ((t1.a <> t2.b) AND (t1.a < t2.a))
         ->  Streaming(type: BROADCAST)  (cost=0.00..15.18 rows=40 width=8)
               Spawn on: All datanodes
               ->  Seq Scan on t1  (cost=0.00..13.13 rows=20 width=8)
         ->  Materialize  (cost=0.00..13.20 rows=20 width=8)
               ->  Seq Scan on t2  (cost=0.00..13.13 rows=20 width=8)
(9 rows)


--删除。
gaussdb=# DROP TABLE t1,t2;

父主题： 表连接方式

上一篇：表连接方式

下一篇：Hash Join

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消