EXPLAIN ANALYZE
语法
EXPLAIN ANALYZE [VERBOSE] statement
描述
执行一条SQL语句,并显示分布式执行计划,以及过程中每个操作的代价。
VERBOSE可选参数,带上这个参数意味着会显示更多详细信息和底层统计数据。这个统计信息不能保证完全正确,特别是对于一些快速执行完成的语句。
限制
Explain analyze不支持DDL语句。
示例
下面这个例子,你可以看到每个阶段(Stage)的CPU时间消耗,每个计划节点相应的代价。
这个代价是基于现实时间(wall time),而非CPU 的相关时间。
对每一个计划节点,都可以看到额外的统计信息,例如每个节点实例的输入平均值,哈希碰撞(hash collisions)的平均次数。这些统计信息对于分析一条SQL语句中的数据异常情况(skewness数据倾斜,abnormal hash collisions)非常有用。
EXPLAIN ANALYZE SELECT count(*),sum(totalprice) FROM new_orders GROUP BY orderstatus; Query Plan ------------------------------------------------------------------------------------------------------------------------------- Fragment 1 [HASH] CPU: 29.19ms, Scheduled: 134.78ms, Input: 2 rows (77B); per task: avg.: 1.00 std.dev.: 1.00, Output: 2 rows (36B) Output layout: [count, sum] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION Project[] │ Layout: [count:bigint, sum:double] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ CPU: 4.00ms (2.34%), Scheduled: 10.00ms (33.33%), Output: 2 rows (36B) │ Input avg.: 0.06 rows, Input std.dev.: 387.30% └─ Aggregate(FINAL)[orderstatus][$hashvalue] │ Layout: [orderstatus:varchar, $hashvalue:bigint, count:bigint, sum:double] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ CPU: 6.00ms (3.51%), Scheduled: 17.00ms (56.67%), Output: 2 rows (77B) │ Input avg.: 0.06 rows, Input std.dev.: 387.30% │ count := count("count_9") │ sum := sum("sum_10") └─ LocalExchange[HASH][$hashvalue] ("orderstatus") │ Layout: [orderstatus:varchar, sum_10:double, count_9:bigint, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ CPU: 2.00ms (1.17%), Scheduled: 3.00ms (10.00%), Output: 2 rows (77B) │ Input avg.: 0.06 rows, Input std.dev.: 556.78% └─ RemoteSource[2] Layout: [orderstatus:varchar, sum_10:double, count_9:bigint, $hashvalue_11:bigint] CPU: 1.00ms (0.58%), Scheduled: 3.00ms (10.00%), Output: 2 rows (77B) Input avg.: 0.06 rows, Input std.dev.: 556.78% Fragment 2 [SOURCE] CPU: 17.35ms, Scheduled: 80.04ms, Input: 4 rows (81B); per task: avg.: 4.00 std.dev.: 0.00, Output: 2 rows (77B) Output layout: [orderstatus, sum_10, count_9, $hashvalue_12] Output partitioning: HASH [orderstatus][$hashvalue_12] Stage Execution Strategy: UNGROUPED_EXECUTION Aggregate(PARTIAL)[orderstatus][$hashvalue_12] │ Layout: [orderstatus:varchar, $hashvalue_12:bigint, sum_10:double, count_9:bigint] │ CPU: 1.00ms (0.58%), Scheduled: 6.00ms (20.00%), Output: 2 rows (77B) │ Input avg.: 4.00 rows, Input std.dev.: 0.00% │ sum_10 := sum("totalprice") │ count_9 := count(*) └─ ScanProject[table = hive:default:new_orders, grouped = false] Layout: [orderstatus:varchar, totalprice:double, $hashvalue_12:bigint] Estimates: {rows: 4 (292B), cpu: 256, memory: 0B, network: 0B}/{rows: 4 (292B), cpu: 548, memory: 0B, network: 0B} CPU: 16.00ms (9.36%), Scheduled: 132.00ms (440.00%), Output: 4 rows (117B) Input avg.: 4.00 rows, Input std.dev.: 0.00% $hashvalue_12 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("orderstatus"), 0)) orderstatus := orderstatus:string:1:REGULAR totalprice := totalprice:double:2:REGULAR Input: 4 rows (81B), Filtered: 0.00% (1 row)