Updated on 2024-08-20 GMT+08:00

EXPLAIN

Description

Shows the execution plan of an SQL statement.

The execution plan shows how the tables referenced by the SQL statement will be scanned, for example, by plain sequential scan or index scan. If multiple tables are referenced, the execution plan also shows what join algorithms will be used to bring together the required rows from each input table.

The most critical part of the display is the estimated statement execution cost, which is the planner's guess at how long it will take to run the statement.

The ANALYZE option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for determining whether the planner's estimates are close to reality.

Precautions

The statement is actually executed when ANALYZE is used. If you want to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, CREATE TABLE AS, or EXECUTE statement without letting the statement affect your data, use this approach:

1
2
3
START TRANSACTION;
EXPLAIN ANALYZE ...;
ROLLBACK;

Syntax

  • Display the execution plan of an SQL statement, which supports multiple options and has no requirements for the order of options.
    1
    EXPLAIN [ ( option [, ...] ) ] statement;
    

    The syntax of the option clause is as follows:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    ANALYZE [ boolean ] |
        ANALYSE [ boolean ] |
        VERBOSE [ boolean ] |
        COSTS [ boolean ] |
        CPU [ boolean ] |
        DETAIL [ boolean ] |
        NODES [ boolean ] |
        NUM_NODES [ boolean ] |
        BUFFERS [ boolean ] |
        TIMING [ boolean ] |
        PLAN [ boolean ] |
        BLOCKNAME [ boolean ] |
        FORMAT { TEXT | XML | JSON | YAML }
    
  • Display the execution plan of an SQL statement, where options are in order.
    1
    EXPLAIN { [ ANALYZE | ANALYSE ] [ VERBOSE ] | PERFORMANCE } statement;
    

Parameters

  • statement

    Specifies the SQL statement to explain.

  • ANALYZE boolean | ANALYSE boolean

    Specifies whether to display actual run times and other statistics. When two parameters are used at the same time, the latter option takes effect.

    Value range:

    • TRUE (default): displays them.
    • FALSE: does not display them.
  • VERBOSE boolean

    Specifies whether to display additional information regarding the plan.

    Value range:

    • TRUE (default): displays it.
    • FALSE: does not display it.
  • COSTS boolean

    Specifies whether to display the estimated total cost of each plan node, estimated number of rows, estimated width of each row.

    Value range:

    • TRUE (default): displays them.
    • FALSE: does not display them.
  • CPU boolean

    Specifies whether to display CPU usage. This parameter must be used together with ANALYZE or ANALYSE.

    Value range:

    • TRUE (default): displays it.
    • FALSE: does not display it.
  • DETAIL boolean

    Specifies whether to display DN information. This parameter must be used together with ANALYZE or ANALYSE.

    Value range:

    • TRUE (default): displays it.
    • FALSE: does not display it.
  • NODES boolean

    Specifies whether to display information about the nodes executed by query.

    Value range:

    • TRUE (default): displays it.
    • FALSE: does not display it.
  • NUM_NODES boolean

    Specifies whether to display the number of executing nodes.

    Value range:

    • TRUE (default): displays it.
    • FALSE: does not display it.
  • BUFFERS boolean

    Specifies whether to display buffer usage. This parameter must be used together with ANALYZE or ANALYSE.

    Value range:

    • TRUE: displays it.
    • FALSE (default): does not display it.
  • TIMING boolean

    Specifies whether to display the actual startup time and time spent on the output node. This parameter must be used together with ANALYZE or ANALYSE.

    Value range:

    • TRUE (default): displays them.
    • FALSE: does not display them.
  • PLAN boolean

    Specifies whether to store the execution plan in PLAN_TABLE. If this parameter is set to on, the execution plan is stored in PLAN_TABLE and is not displayed on the screen. Therefore, this parameter cannot be used together with other parameters when it is set to on.

    Value range:

    • TRUE (default): The execution plan is stored in PLAN_TABLE and not displayed on the screen. If the plan is stored successfully, "EXPLAIN SUCCESS" is returned.
    • FALSE: The execution plan is not stored but is printed on the screen.
  • BLOCKNAME boolean

    Specifies whether to display the query block where each operation of the plan is located. When this option is enabled, the name of the query block where each operation is performed is displayed in the Query Block column. This helps users obtain the query block name and use hints to modify the execution plan.

    • TRUE (default value): When the plan is displayed, the name of the query block where each operation is located is displayed in the Query Block column. This option must be used in the pretty mode. For details, see Hint Specifying the Query Block Where the Hint Is Located.
    • FALSE: The plan display is not affected.
  • FORMAT

    Specifies the output format.

    Value range: TEXT, XML, JSON, and YAML

    Default value: TEXT

  • PERFORMANCE

    Prints all relevant information in execution. Some information is described as follows:

    • ex c/r: indicates the average number of CPU cycles used by each row, which is equal to (ex cyc) / (ex row).
    • ex row: indicates the number of executed rows.
    • ex cyc: indicates the number of used CPU cycles.
    • inc cyc: indicates the total number of CPU cycles used by subnodes.
    • shared hit: indicates the shared buffer hits of the operator.
    • loops: indicates the number of operator loop execution times.
    • total_calls: indicates the total number of generated elements.
    • remote query poll time stream gather: indicates the operator used to listen to the network poll time when data on each DN reaches the CN.
    • deserialize time: indicates the time required for deserialization.
    • estimated time: indicates the estimated time.
    • Network Poll Time: indicates the duration for the libcomm receiver to wait for data during distributed stream network communication.
    • Stream Send time: indicates the time consumed by libcomm or libpq to send data during distributed stream network communication.
    • OS Kernel Send time: indicates the time required for the OS layer to send data during distributed stream network communication. This parameter is displayed only when the value is greater than 0.
    • Wait Quota time: indicates the duration for libcomm to wait for the peer end to send the quota traffic control size during distributed stream network communication. This parameter is displayed only when the value is greater than 0.
    • Data Serialize time: indicates the data serialization time during distributed stream network communication.
    • Data Copy time: indicates the data replication time during distributed stream network communication. This parameter is displayed only when the value is greater than 0.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
-- Create a schema.
gaussdb=# CREATE SCHEMA tpcds;

-- Create the tpcds.customer_address table.
gaussdb=# CREATE TABLE tpcds.customer_address
(
ca_address_sk         INTEGER           NOT NULL,
ca_address_id         CHARACTER(16)     NOT NULL
);
 
-- Insert multiple records into the table.
gaussdb=# INSERT INTO tpcds.customer_address VALUES (5000, 'AAAAAAAABAAAAAAA'),(10000, 'AAAAAAAACAAAAAAA');

-- Create the tpcds.customer_address_p1 table.
gaussdb=# CREATE TABLE tpcds.customer_address_p1 AS TABLE tpcds.customer_address;

-- Change the value of explain_perf_mode to normal.
gaussdb=# SET explain_perf_mode=normal;

-- Display an execution plan for simple queries in the table.
gaussdb=# EXPLAIN SELECT * FROM tpcds.customer_address_p1;
QUERY PLAN
--------------------------------------------------
Data Node Scan  (cost=0.00..0.00 rows=0 width=0)
Node/s: All DNs
(2 rows)

-- Use the ANALYZE option to add runtime statistics to the output.
gaussdb=# EXPLAIN ANALYZE SELECT * FROM tpcds.customer_address_p1;
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Data Node Scan  (cost=0.00..0.00 rows=0 width=0) (actual time=1.754..3.218 rows=2 loops=1)
   Node/s: All DNs
 Total runtime: 3.272 ms
(3 rows)

-- Use the ANALYZE and CPU options to output the CPU usage information.
gaussdb=# EXPLAIN (ANALYZE,CPU)SELECT * FROM tpcds.customer_address_p1;
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Data Node Scan  (cost=0.00..0.00 rows=0 width=0) (actual time=1.996..2.214 rows=2 loops=1)
   Node/s: All DNs
   (CPU: ex c/r=25694795469106248, ex row=2, ex cyc=51389590938212496, inc cyc=51389590938212496)
 Total runtime: 2.251 ms
(4 rows)

-- Generate an execution plan in JSON format (with explain_perf_mode being normal).
gaussdb=# EXPLAIN(FORMAT JSON) SELECT * FROM tpcds.customer_address_p1;
              QUERY PLAN              
--------------------------------------
 [                                   +
   {                                 +
     "Plan": {                       +
       "Node Type": "Data Node Scan",+
       "Startup Cost": 0.00,         +
       "Total Cost": 0.00,           +
       "Plan Rows": 0,               +
       "Plan Width": 0,              +
       "Node/s": "All DNs"     +
     }                               +
   }                                 +
 ]
(1 row)


-- Generate an execution plan in YAML format (with explain_perf_mode being normal).
gaussdb=# EXPLAIN(FORMAT YAML) SELECT * FROM tpcds.customer_address_p1 WHERE ca_address_sk=10000;
           QUERY PLAN            
---------------------------------
 - Plan:                        +
     Node Type: "Data Node Scan"+
     Startup Cost: 0.00         +
     Total Cost: 0.00           +
     Plan Rows: 0               +
     Plan Width: 0              +
     Node/s: "dn_6005_6006"
(1 row)

-- Suppress the execution plan of cost estimation.
gaussdb=# EXPLAIN(COSTS FALSE)SELECT * FROM tpcds.customer_address_p1 WHERE ca_address_sk=10000;
       QUERY PLAN       
------------------------
 Data Node Scan
   Node/s: dn_6005_6006
(2 rows)

-- Generate an execution plan with aggregate functions for a query.
gaussdb=# EXPLAIN SELECT SUM(ca_address_sk) FROM tpcds.customer_address_p1 WHERE ca_address_sk<10000;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Aggregate  (cost=18.19..14.32 rows=1 width=4)
   ->  Streaming (type: GATHER)  (cost=18.19..14.32 rows=3 width=4)
         Node/s: All DNs
         ->  Aggregate  (cost=14.19..14.20 rows=3 width=4)
               ->  Seq Scan on customer_address_p1  (cost=0.00..14.18 rows=10 width=4)
                     Filter: (ca_address_sk < 10000)
(6 rows)

-- Delete the tpcds.customer_address_p1 table.
gaussdb=# DROP TABLE tpcds.customer_address_p1;

-- Delete the tpcds.customer_address table.
gaussdb=# DROP TABLE tpcds.customer_address;

-- Delete a schema.
gaussdb=# DROP SCHEMA tpcds CASCADE;

Helpful Links

ANALYZE | ANALYSE