EXPLAIN

Description

Shows the execution plan of an SQL statement.

The execution plan shows how the tables referenced by the SQL statement will be scanned, for example, by plain sequential scan or index scan. If multiple tables are referenced, the execution plan also shows what join algorithms will be used to bring together the required rows from each input table.

The most critical part of the display is the estimated statement execution cost, which is the plan generator's guess at how long it will take to run the statement.

The ANALYZE option causes the statement to be actually executed. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for determining whether the plan generator's estimated value is close to the actual one.

Precautions

The statement is actually executed when the ANALYZE option is used. If you want to use EXPLAIN on an INSERT, UPDATE, DELETE, or EXECUTE statement without letting the statement affect your data, use this approach:
```
START TRANSACTION;
EXPLAIN ANALYZE ...;
ROLLBACK;
```

The DETAIL, NODES, and NUM_NODES parameters are disabled in standalone mode. They are available only in distributed mode. If the parameters are used, the following error is reported:

m_db=# CREATE TABLE student(id int, name char(20));
CREATE TABLE
m_db=# EXPLAIN (NODE true) INSERT INTO student VALUES(5,'a'),(6,'b');
ERROR:  unrecognized EXPLAIN option "nodes"
m_db=# EXPLAIN (NUM_NODES true) INSERT INTO student VALUES(5,'a'),(6,'b');
ERROR:  unrecognized EXPLAIN option "num_nodes"

Syntax

Display the execution plan of an SQL statement, which supports multiple options and has no requirements for the order of options.

EXPLAIN [ (  option  [, ...] )  ] statement;

The syntax of the option clause is as follows:

ANALYZE [ boolean ] |
    VERBOSE [ boolean ] |
    COSTS [ boolean ] |
    CPU [ boolean ] |
    DETAIL [ boolean ] |
    NODES [ boolean ] | (available only in distributed mode)
    NUM_NODES [ boolean ] | (available only in distributed mode)
    BUFFERS [ boolean ] |
    TIMING [ boolean ] |
    PLAN [ boolean ] |
    BLOCKNAME [ boolean ] |
    OUTLINE [ boolean ] |
    ADAPTCOST [ boolean ] |
    FORMAT { TEXT | XML | JSON | YAML }
    OPTEVAL [ boolean ]

Display the execution plan of an SQL statement, where options are in order.
```
EXPLAIN  { [   ANALYZE    ] [ VERBOSE  ] } statement;
```

Parameters

statement
Specifies the SQL statement to explain.
ANALYZE boolean
Specifies whether to display the actual runtime and other statistics. When two parameters are used at the same time, the latter one in option takes effect.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
VERBOSE boolean
Specifies whether to display additional information regarding the plan.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
COSTS boolean
Specifies whether to display the estimated total cost of each plan node, estimated number of rows, estimated width of each row.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
CPU boolean
Specifies whether to display CPU usage. This parameter must be used together with ANALYZE.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
DETAIL boolean
Specifies whether to display information about database nodes. This parameter must be used together with ANALYZE.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
NODES boolean (available only in distributed mode)
Specifies whether to display information about the nodes executed by query.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
NUM_NODES boolean (available only in distributed mode)
Specifies whether to display the number of executing nodes.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
BUFFERS boolean
Specifies whether to display buffer usage. This parameter must be used together with ANALYZE.

Value range:
- TRUE: The information is displayed.
- FALSE (default): The information is not displayed.
TIMING boolean
Specifies whether to display the actual startup time and time spent on the output node. This parameter must be used together with ANALYZE.

Value range:
- TRUE (default): The information is displayed.
- FALSE: The information is not displayed.
PLAN
Specifies whether to store the execution plan in PLAN_TABLE. If this parameter is set to on, the execution plan is stored in PLAN_TABLE and is not displayed on the screen. Therefore, this parameter cannot be used together with other parameters when it is set to on.

Value range:
- ON (default): stores the execution plan in PLAN_TABLE and does not display it on the screen. If the plan is stored successfully, "EXPLAIN SUCCESS" is returned.
- OFF: does not store the execution plan but displays it on the screen.
BLOCKNAME boolean
Specifies whether to display the query block where each operation of the plan is located. When this option is enabled, each name of the query block where each operation is performed is displayed in the Query Block column. This helps users obtain the query block name and use hints to modify the execution plan.
- TRUE (default value): The name of the query block where each operation is located is displayed in the Query Block column. This option must be used in the pretty mode.
- FALSE: The plan display is not affected.
OUTLINE boolean
Specifies whether to display outline hints of a plan.
- ON: displays them below the plan when the plan is displayed.
- OFF (default): does not display them.
ADAPTCOST boolean
Specifies whether to display the cardinality estimation method of a plan in the normal mode.
- ON (default): displays the cardinality estimation method on the plan node in normal mode, including the default method and feedback method. This parameter does not take effect for prepared statements.
- OFF: does not display it.
FORMAT
Specifies the output format.

Value range: TEXT, XML, JSON, and YAML

Default value: TEXT
PERFORMANCE
Prints all relevant information in execution. Some information is described as follows:
- ex c/r: indicates the average number of CPU cycles used by each row, which is equal to (ex cyc) / (ex row).
- ex row: indicates the number of executed rows.
- ex cyc: indicates the number of used CPU cycles.
- inc cyc: indicates the total number of CPU cycles used by subnodes.
- shared hit: indicates the shared buffer hits of the operator.
- loops: indicates the number of operator loop execution times.
- total_calls: indicates the total number of generated elements.
- remote query poll time stream gather: indicates the operator used to listen to the network poll time when data on each DN reaches the CN.
- deserialize time: indicates the time required for deserialization.
- estimated time: indicates the estimated time.
OPTEVAL boolean
Specifies whether to display the cost elimination details of the scan operator (currently, only seqscan, indexscan, indexonlyscan and bitmapheapscan are supported). When this function is enabled, a plan block named Cost Evaluation Info (identified by plan id) is displayed in the execution plan. This option can only coexist with COSTS, VERBOSE, and FORMAT. For details about the parameters in the plan block, see Example 2.
Value range:
- TRUE: The information is displayed.
- FALSE (default): The information is not displayed.

Example 1

-- Create a schema.
m_db=# CREATE SCHEMA tpcds; 

  -- Create the tpcds.customer_address table.
m_db=# CREATE TABLE tpcds.customer_address
(
ca_address_sk         INTEGER           NOT NULL,
ca_address_id         CHARACTER(16)     NOT NULL
);
 
-- Insert multiple records into the table.
m_db=# INSERT INTO tpcds.customer_address VALUES (5000, 'AAAAAAAABAAAAAAA'),(10000, 'AAAAAAAACAAAAAAA');

-- Create the tpcds.customer_address_p1 table.
m_db=# CREATE TABLE tpcds.customer_address_p1 LIKE tpcds.customer_address;

-- Change the value of explain_perf_mode to normal.
m_db=# SET explain_perf_mode=normal;

-- Display an execution plan for simple queries in the table.
m_db=# EXPLAIN SELECT * FROM tpcds.customer_address_p1;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Seq Scan on customer_address_p1  (cost=0.00..18.01 rows=801 width=72)
(1 row)

-- Generate an execution plan in JSON format (with explain_perf_mode being normal).
m_db=# EXPLAIN(FORMAT JSON) SELECT * FROM tpcds.customer_address_p1;
                  QUERY PLAN                   
-----------------------------------------------
 [                                            +
   {                                          +
     "Plan": {                                +
       "Node Type": "Seq Scan",               +
       "Relation Name": "customer_address_p1",+
       "Alias": "customer_address_p1",        +
       "Startup Cost": 0.00,                  +
       "Total Cost": 18.01,                   +
       "Plan Rows": 801,                      +
       "Plan Width": 72                       +
     }                                        +
   }                                          +
 ]
(1 row)

-- If there is an index and we use a query with an indexable WHERE condition, a different plan may be displayed.
m_db=# EXPLAIN SELECT * FROM tpcds.customer_address_p1 WHERE ca_address_sk=10000;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Seq Scan on customer_address_p1  (cost=0.00..20.01 rows=4 width=72)
   Filter: (ca_address_sk = 10000)
(2 rows)

-- Generate an execution plan in YAML format (with explain_perf_mode being normal).
m_db=# EXPLAIN(FORMAT YAML) SELECT * FROM tpcds.customer_address_p1 WHERE ca_address_sk=10000;
                QUERY PLAN                
------------------------------------------
 - Plan:                                 +
     Node Type: "Seq Scan"               +
     Relation Name: "customer_address_p1"+
     Alias: "customer_address_p1"        +
     Startup Cost: 0.00                  +
     Total Cost: 20.01                   +
     Plan Rows: 4                        +
     Plan Width: 72                      +
     Filter: "(ca_address_sk = 10000)"
(1 row)

-- Suppress the execution plan of cost estimation.
m_db=# EXPLAIN(COSTS FALSE) SELECT * FROM tpcds.customer_address_p1 WHERE ca_address_sk=10000;
            QUERY PLAN             
-----------------------------------
 Seq Scan on customer_address_p1
   Filter: (ca_address_sk = 10000)
(2 rows)

-- Generate an execution plan with aggregate functions for a query.
m_db=# EXPLAIN SELECT SUM(ca_address_sk) FROM tpcds.customer_address_p1 WHERE ca_address_sk<10000;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Aggregate  (cost=20.68..20.69 rows=1 width=12)
   ->  Seq Scan on customer_address_p1  (cost=0.00..20.01 rows=267 width=4)
         Filter: (ca_address_sk < 10000)
(3 rows)

-- Create a level-2 partitioned table.
m_db=# CREATE TABLE range_hash ( ID INT, purchased YEAR ) PARTITION BY RANGE ( ID ) SUBPARTITION BY HASH ( purchased ) (
	PARTITION p0
	VALUES
		LESS THAN ( 1000 ) ( SUBPARTITION s0, SUBPARTITION s1 ),
		PARTITION p1
	VALUES
		LESS THAN ( 2000 ) ( SUBPARTITION s2, SUBPARTITION s3 ),
		PARTITION p2
	VALUES
	LESS THAN MAXVALUE ( SUBPARTITION s4, SUBPARTITION s5 ) 
);

-- Run a query statement containing a level-2 partitioned table.
m_db=# EXPLAIN SELECT * FROM range_hash WHERE purchased  = '1';
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..36.86 rows=11 width=8)
   Iterations: 3, Sub Iterations: 6
   ->  Partitioned Seq Scan on range_hash  (cost=0.00..36.86 rows=11 width=8)
         Filter: (purchased = '2001'::year)
         Selected Partitions:  1..3
         Selected Subpartitions:  ALL
(6 rows)

-- Drop the tpcds.customer_address_p1 table.
m_db=# DROP TABLE tpcds.customer_address_p1;

-- Drop the tpcds.customer_address table.
m_db=# DROP TABLE tpcds.customer_address;

-- Drop the range_list table.
m_db=# DROP TABLE range_hash;

-- Drop the schema.
m_db=# DROP SCHEMA tpcds;

Example 2

-- Create the tb_a and tb_b tables.
m_db=# CREATE TABLE tb_a(c1 int);
m_db=# INSERT INTO tb_a VALUES(1),(2),(3);
m_db=# CREATE TABLE tb_b AS SELECT * FROM tb_a;
m_db=# EXPLAIN (OPTEVAL on )SELECT * FROM tb_a a, tb_b b WHERE a.c1=b.c1 AND a.c1=1;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Nested Loop  (cost=0.00..81.88 rows=144 width=8)
   ->  Seq Scan on tb_a a  (cost=0.00..40.03 rows=12 width=4)
         Filter: (c1 = 1)
   ->  Materialize  (cost=0.00..40.09 rows=12 width=4)
         ->  Seq Scan on tb_b b  (cost=0.00..40.03 rows=12 width=4)
               Filter: (c1 = 1)
(6 rows)

-- Drop the tb_a and tb_b tables.
m_db=# DROP TABLE tb_a;
m_db=# DROP TABLE tb_b;

For the Cost Evaluation Info (identified by plan id) plan block:

The rows 2 -- and 4 -- represent the current winning operator, and the indentation plan blocks under each row represent the operators directly eliminated by the current winning operator. For example, operator 2 eliminates the Seq Scan and Bitmap Heap Scan operators.
The preceding key parameters are described as follows:
1. id indicates that the current operator is converted from a path with a specified ID. The value is used to locate a path in debug2 logs.
2. rpage indicates the number of pages in the base table used during the cost model calculation.
3. ipage indicates the number of pages in the index used during the cost model calculation.
4. tuples indicates the number of tuples used in the cost model calculation.
5. selec indicates the index selectivity used during cost model calculation. The value -1 indicates that the index selectivity of the current operator is invalid.
6. ml indicates the ML model event that is currently triggered during cost model calculation. The value 1 indicates that the cache specified by effective_cache_size is sufficient, the value 2 indicates that the cache specified by effective_cache_size is insufficient, and the value 3 indicates that the cache specified by effective_cache_size is severely insufficient.
7. iscost specifies whether an event of ignoring the startup cost has occurred during model cost calculation.
8. lossy specifies whether to trigger the lossy mechanism of bitmap heap scan during cost model calculation.
9. uidx specifies whether to trigger the unique index first rule during cost model calculation.