更新时间:2024-12-31 GMT+08:00

EXPLAIN

语法

EXPLAIN [ ( option [, ...] ) ] statement

其中选项可以是以下选项之一:

FORMAT { TEXT | GRAPHVIZ | JSON }

TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO }

描述

显示一条语句的逻辑的或者分布式的执行计划,也可以用于校验一条SQL语句,或者是分析IO。

参数TYPE DISTRIBUTED用于显示分片后的计划(fragmented plan)。每一个fragment都会被一个或者多个节点执行。Fragments separation表示数据在两个节点之间进行交换。Fragment type表示一个fragment如何被执行以及数据在不同fragment之间怎样分布。

  • SINGLE

    Fragment会在单个节点上执行。

  • HASH

    Fragment会在固定数量的节点上执行,输入数据通过哈希函数进行分布。

  • ROUND_ROBIN

    Fragment会在固定数量的节点上执行,片段在固定数量的节点上执行,输入数据以轮循方式进行分布。

  • BROADCAST

    Fragment会在固定数量的节点上执行,输入数据被广播到所有的节点。

  • SOURCE

    Fragment在访问输入分段的节点上执行。

示例

  • LOGICAL:
    CREATE TABLE testTable (regionkey int, name varchar);
    EXPLAIN SELECT regionkey, count(*) FROM testTable GROUP BY 1;
                                                                 Query Plan                                                              
    -------------------------------------------------------------------------------------------------------------------------------------
     Output[regionkey, _col1]                                                                                                            
     │   Layout: [regionkey:integer, count:bigint]                                                                                       
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                         
     │   _col1 := count                                                                                                                  
     └─ RemoteExchange[GATHER]                                                                                                           
        │   Layout: [regionkey:integer, count:bigint]                                                                                    
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                      
        └─ Project[]                                                                                                                     
           │   Layout: [regionkey:integer, count:bigint]                                                                                 
           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                   
           └─ Aggregate(FINAL)[regionkey][$hashvalue]                                                                                    
              │   Layout: [regionkey:integer, $hashvalue:bigint, count:bigint]                                                           
              │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                                
              │   count := count("count_8")                                                                                              
              └─ LocalExchange[HASH][$hashvalue] ("regionkey")                                                                           
                 │   Layout: [regionkey:integer, count_8:bigint, $hashvalue:bigint]                                                      
                 │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                             
                 └─ RemoteExchange[REPARTITION][$hashvalue_9]                                                                            
                    │   Layout: [regionkey:integer, count_8:bigint, $hashvalue_9:bigint]                                                 
                    │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                          
                    └─ Aggregate(PARTIAL)[regionkey][$hashvalue_10]                                                                      
                       │   Layout: [regionkey:integer, $hashvalue_10:bigint, count_8:bigint]                                             
                       │   count_8 := count(*)                                                                                           
                       └─ ScanProject[table = hive:default:testtable]                                                                    
                              Layout: [regionkey:integer, $hashvalue_10:bigint]                                                          
                              Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}/{rows: 0 (0B), cpu: 0, memory: 0B, network: 0B} 
                              $hashvalue_10 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("regionkey"), 0))               
                              regionkey := regionkey:int:0:REGULAR
  • DISTRIBUTED:
    EXPLAIN (type DISTRIBUTED) SELECT regionkey, count(*) FROM testTable GROUP BY 1;
                                                          Query Plan                                                       
    -----------------------------------------------------------------------------------------------------------------------
     Fragment 0 [SINGLE]                                                                                                   
         Output layout: [regionkey, count]                                                                                 
         Output partitioning: SINGLE []                                                                                    
         Stage Execution Strategy: UNGROUPED_EXECUTION                                                                     
         Output[regionkey, _col1]                                                                                          
         │   Layout: [regionkey:integer, count:bigint]                                                                     
         │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                       
         │   _col1 := count                                                                                                
         └─ RemoteSource[1]                                                                                                
                Layout: [regionkey:integer, count:bigint]                                                                  
    
     Fragment 1 [HASH]                                                                                                     
         Output layout: [regionkey, count]                                                                                 
         Output partitioning: SINGLE []                                                                                    
         Stage Execution Strategy: UNGROUPED_EXECUTION                                                                     
         Project[]                                                                                                         
         │   Layout: [regionkey:integer, count:bigint]                                                                     
         │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                       
         └─ Aggregate(FINAL)[regionkey][$hashvalue]                                                                        
            │   Layout: [regionkey:integer, $hashvalue:bigint, count:bigint]                                               
            │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                    
            │   count := count("count_8")                                                                                  
            └─ LocalExchange[HASH][$hashvalue] ("regionkey")                                                               
               │   Layout: [regionkey:integer, count_8:bigint, $hashvalue:bigint]                                          
               │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}                                                 
               └─ RemoteSource[2]                                                                                          
                      Layout: [regionkey:integer, count_8:bigint, $hashvalue_9:bigint]                                     
    
     Fragment 2 [SOURCE]                                                                                                   
         Output layout: [regionkey, count_8, $hashvalue_10]                                                                
         Output partitioning: HASH [regionkey][$hashvalue_10]                                                              
         Stage Execution Strategy: UNGROUPED_EXECUTION                                                                     
         Aggregate(PARTIAL)[regionkey][$hashvalue_10]                                                                      
         │   Layout: [regionkey:integer, $hashvalue_10:bigint, count_8:bigint]                                             
         │   count_8 := count(*)                                                                                           
         └─ ScanProject[table = hive:default:testtable, grouped = false]                                                   
                Layout: [regionkey:integer, $hashvalue_10:bigint]                                                          
                Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}/{rows: 0 (0B), cpu: 0, memory: 0B, network: 0B} 
                $hashvalue_10 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("regionkey"), 0))               
                regionkey := regionkey:int:0:REGULAR 
  • VALIDATE:
    EXPLAIN (TYPE VALIDATE) SELECT regionkey, count(*) FROM testTable GROUP BY 1;
     Valid
    -------
     true
  • IO:
    EXPLAIN (TYPE IO, FORMAT JSON) SELECT regionkey , count(*) FROM testTable GROUP BY 1;
    
               Query Plan            
    ---------------------------------
     {                               
       "inputTableColumnInfos" : [ { 
         "table" : {                 
           "catalog" : "hive",       
           "schemaTable" : {         
             "schema" : "default",   
             "table" : "testtable"   
           }                         
         },                          
         "columnConstraints" : [ ]   
       } ]                           
     }