Updated on 2024-10-25 GMT+08:00

EXPLAIN

Syntax

EXPLAIN [ ( option [, ...] ) ] statement

The option can be the following:

FORMAT { TEXT | GRAPHVIZ | JSON }

TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO}

Description

This statement is used to display the logical or distributed execution plan of a statement. It can also be used to verify an SQL statement or analyze I/Os.

The TYPE DISTRIBUTED parameter is used to display the fragmented plan. Each fragment is executed by one or more nodes. Fragments separation indicates that data is exchanged between two nodes. Fragment type indicates how a fragment is executed and how data is distributed among different fragments.

  • SINGLE

    Fragments are executed on a single node.

  • HASH

    Fragments are executed on a fixed number of nodes, and the input data is distributed using the hash function.

  • ROUND_ROBIN

    Fragments are executed on a fixed number of nodes, and input data is distributed in round-robin mode.

  • BROADCAST

    Fragments are executed on a fixed number of nodes, and the input data is broadcast to all nodes.

  • SOURCE

    Fragments are executed on the node that accesses the input fragments.

Example

  • LOGICAL:
    CREATE TABLE testTable (regionkey int, name varchar);
    EXPLAIN SELECT regionkey, count(*) FROM testTable GROUP BY 1;
                                                                 Query Plan                                                              
    -------------------------------------------------------------------------------------------------------------------------------------
     Output[regionkey, _col1]                                                                                                            
     │   Layout: [regionkey:integer, count:bigint]                                                                                       
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                         
     │   _col1 := count                                                                                                                  
     └─ RemoteExchange[GATHER]                                                                                                           
        │   Layout: [regionkey:integer, count:bigint]                                                                                    
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                      
        └─ Project[]                                                                                                                     
           │   Layout: [regionkey:integer, count:bigint]                                                                                 
           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                   
           └─ Aggregate(FINAL)[regionkey][$hashvalue]                                                                                    
              │   Layout: [regionkey:integer, $hashvalue:bigint, count:bigint]                                                           
              │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                
              │   count := count("count_8")                                                                                              
              └─ LocalExchange[HASH][$hashvalue] ("regionkey")                                                                           
                 │   Layout: [regionkey:integer, count_8:bigint, $hashvalue:bigint]                                                      
                 │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                             
                 └─ RemoteExchange[REPARTITION][$hashvalue_9]                                                                            
                    │   Layout: [regionkey:integer, count_8:bigint, $hashvalue_9:bigint]                                                 
                    │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                          
                    └─ Aggregate(PARTIAL)[regionkey][$hashvalue_10]                                                                      
                       │   Layout: [regionkey:integer, $hashvalue_10:bigint, count_8:bigint]                                             
                       │   count_8 := count(*)                                                                                           
                       └─ ScanProject[table = hive:default:testtable]                                                                    
                              Layout: [regionkey:integer, $hashvalue_10:bigint]                                                          
                              Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}/{rows: 0 (0B), cpu: 0, memory: 0B, network: 0B} 
                              $hashvalue_10 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("regionkey"), 0))               
                              regionkey := regionkey:int:0:REGULAR
  • DISTRIBUTED:
    EXPLAIN (type DISTRIBUTED) SELECT regionkey, count(*) FROM testTable GROUP BY 1;
                                                          Query Plan                                                       
    -----------------------------------------------------------------------------------------------------------------------
     Fragment 0 [SINGLE]                                                                                                   
         Output layout: [regionkey, count]                                                                                 
         Output partitioning: SINGLE []                                                                                    
         Stage Execution Strategy: UNGROUPED_EXECUTION                                                                     
         Output[regionkey, _col1]                                                                                          
         │   Layout: [regionkey:integer, count:bigint]                                                                     
         │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                       
         │   _col1 := count                                                                                                
         └─ RemoteSource[1]                                                                                                
                Layout: [regionkey:integer, count:bigint]                                                                  
    
     Fragment 1 [HASH]                                                                                                     
         Output layout: [regionkey, count]                                                                                 
         Output partitioning: SINGLE []                                                                                    
         Stage Execution Strategy: UNGROUPED_EXECUTION                                                                     
         Project[]                                                                                                         
         │   Layout: [regionkey:integer, count:bigint]                                                                     
         │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                       
         └─ Aggregate(FINAL)[regionkey][$hashvalue]                                                                        
            │   Layout: [regionkey:integer, $hashvalue:bigint, count:bigint]                                               
            │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                    
            │   count := count("count_8")                                                                                  
            └─ LocalExchange[HASH][$hashvalue] ("regionkey")                                                               
               │   Layout: [regionkey:integer, count_8:bigint, $hashvalue:bigint]                                          
               │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                 
               └─ RemoteSource[2]                                                                                          
                      Layout: [regionkey:integer, count_8:bigint, $hashvalue_9:bigint]                                     
    
     Fragment 2 [SOURCE]                                                                                                   
         Output layout: [regionkey, count_8, $hashvalue_10]                                                                
         Output partitioning: HASH [regionkey][$hashvalue_10]                                                              
         Stage Execution Strategy: UNGROUPED_EXECUTION                                                                     
         Aggregate(PARTIAL)[regionkey][$hashvalue_10]                                                                      
         │   Layout: [regionkey:integer, $hashvalue_10:bigint, count_8:bigint]                                             
         │   count_8 := count(*)                                                                                           
         └─ ScanProject[table = hive:default:testtable, grouped = false]                                                   
                Layout: [regionkey:integer, $hashvalue_10:bigint]                                                          
                Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}/{rows: 0 (0B), cpu: 0, memory: 0B, network: 0B} 
                $hashvalue_10 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("regionkey"), 0))               
                regionkey := regionkey:int:0:REGULAR 
  • VALIDATE:
    EXPLAIN (TYPE VALIDATE) SELECT id, count(*) FROM testTable GROUP BY 1;
     Valid
    -------
     true
  • I/O:
    EXPLAIN (TYPE IO, FORMAT JSON) SELECT regionkey , count(*) FROM testTable GROUP BY 1;
    
               Query Plan            
    ---------------------------------
     {                               
       "inputTableColumnInfos" : [ { 
         "table" : {                 
           "catalog" : "hive",       
           "schemaTable" : {         
             "schema" : "default",   
             "table" : "testtable"   
           }                         
         },                          
         "columnConstraints" : [ ]   
       } ]                           
     }