更新时间:2025-09-08 GMT+08:00
分享

CoreDNS仪表盘模板

CoreDNS作为集群内的DNS服务器,您可以通过CoreDNS仪表盘模板查看CoreDNS日志来分析CoreDNS解析慢、访问高危请求域名等问题。

前提条件

查看CoreDNS日志分析

  1. 登录云日志服务控制台,进入“日志管理”页面。
  2. 在左侧导航栏中选择“仪表盘”。
  3. 在仪表盘模板下方,选择“CoreDNS仪表盘模板 > CoreDNS日志分析”,查看图表详情。

    • 过滤域名,所关联的查询分析语句如下所示:
      * | select name group by name limit 10000
    • 过滤客户端IP,所关联的查询分析语句如下所示:
      * | select remote group by remote limit 10000
    • 过滤状态码,所关联的查询分析语句如下所示:
      * | select rcode group by rcode limit 10000
    • 总请求次数图表所关联的查询分析语句如下所示:
      *| select diff[1] as nowCount, diff[1] - diff[2] as delta from (select compare(total, 86400) as diff from( select count(1) as total from log ) )
    • NXDOMAIN次图表所关联的查询分析语句如下所示
      (* and  rcode: NXDOMAIN) | select diff[1] as nowCount, diff[1] - diff[2] as delta from  (select compare(total, 86400) as diff from(  select count(1) as total from log ) )
    • 请求成功率图表所关联的查询分析语句如下所示:
      *| select diff[1] as nowCount, round(diff[1] - diff[2], 2) as delta from  (select compare(total, 86400) as diff from(  select round( count_if(rcode = 'NOERROR' or rcode = 'NXDOMAIN') * 100.0 / (case when count(1) = 0 then 1 else count(1) end), 2 ) as total from log ) )
    • 域名数图表所关联的查询分析语句如下所示:
      *| select diff[1] as nowCount, diff[1] - diff[2] as delta from  (select compare(total, 86400) as diff from(  select approx_distinct(name) as total from log ) )
    • 平均延迟图表所关联的查询分析语句如下所示:
      *| select diff[1] as nowCount, round(diff[1] - diff[2],3 ) as delta from  (select compare(total, 86400) as diff from(  select round(avg(duration * 1000), 3) as total from log ) )
    • Timeout次数图表所关联的查询分析语句如下所示:
      (level : ERROR) | select diff[1] as nowCount, round(diff[1] - diff[2], 3)  as delta from  (select compare(total, 86400) as diff from( select count(1) as "total" from log where errmsg like '%timeout%' ) )
    • P95延迟图表所关联的查询分析语句如下所示:
      *| select diff[1] as nowCount, round(diff[1] - diff[2], 3) as delta from  (select compare(total, 86400) as diff from(  select round(approx_percentile(duration * 1000, 0.95), 3) as total from log ) ) 
    • P99延迟图表所关联的查询分析语句如下所示:
      *| select diff[1] as nowCount, round(diff[1] - diff[2], 3) as delta from  (select compare(total, 86400) as diff from(  select round(approx_percentile(duration * 1000, 0.99), 3) as total from log ) ) 
    • 请求QPS图表所关联的查询分析语句如下所示:
      *| select date_format(t, '%Y-%m-%d %H:%i') as time , d[1] as "今天", d[2] as "昨天", d[3] as "上周" from( select t, ts_compare(pv, 86400,604800 ) as d from(select from_unixtime(__time - __time % 60000) as t, round(count(1)/ 60.0, 2 ) as pv from log group by t order by t ) group by t ) where date_format(t, '%Y-%m-%d %H:%i') is not null order by time limit 10000
    • 请求平均延迟(ms)图表所关联的查询分析语句如下所示:
      *| select date_format(t, '%Y-%m-%d %H:%i') as time , d[1] as "今天", d[2] as "昨天", d[3] as "上周" from( select t, ts_compare(pv, 86400,604800 ) as d from(select from_unixtime(__time - __time % 60000) as t, round(avg(duration * 1000.0), 3 ) as pv from log where level = 'INFO' group by t order by t ) group by t ) where date_format(t, '%Y-%m-%d %H:%i') is not null order by time limit 10000
    • 成功率(%)图表所关联的查询分析语句如下所示:
      *| select date_format(t, '%Y-%m-%d %H:%i') as time , d[1] as "今天", d[2] as "昨天", d[3] as "上周" from( select t, ts_compare(pv, 86400,604800 ) as d from(select from_unixtime(__time - __time % 60000) as t, round( count_if(rcode = 'NOERROR') * 100.0 / (case when count(1) = 0 then 1 else count(1) end), 2 )  as pv from log group by t order by t ) group by t ) where  date_format(t, '%Y-%m-%d %H:%i') is not null order by time limit 10000
    • P99延迟(ms)图表所关联的查询分析语句如下所示:
      *| select date_format(t, '%Y-%m-%d %H:%i') as time , d[1] as "今天", d[2] as "昨天", d[3] as "上周" from( select t, ts_compare(pv, 86400,604800 ) as d from(select from_unixtime(__time - __time % 60000) as t, round(approx_percentile(duration * 1000.0, 0.99), 3 ) as pv from log where level = 'INFO' group by t order by t ) group by t ) where date_format(t, '%Y-%m-%d %H:%i') is not null order by time limit 10000
    • Top请求域名图表所关联的查询分析语句如下所示:
      *| SELECT name, COUNT(*) as total where name  is not null group by name order by total desc
    • 状态码分布图表所关联的查询分析语句如下所示:
      *| SELECT rcode, COUNT(*) as total group by rcode order by total desc
    • TOP域名图表所关联的查询分析语句如下所示:
      *| SELECT name as "域名", rcode as "状态码", round(sum(size/ 1024.0), 3) as "请求流量KB", round(sum(rsize / 1024.0),3 ) as  "返回流量KB", COUNT(*) as "请求次数", round(avg(duration * 1000), 3) as "平均延迟(ms)" where name  is not null group by name, rcode order by "请求次数" desc
    • TOP NXDOMAIN域名图表所关联的查询分析语句如下所示:
      * and  rcode: NXDOMAIN)| SELECT name as "域名", rcode as "状态码", round(sum(size/ 1024.0), 3) as "请求流量KB", round(sum(rsize / 1024.0),3 ) as  "返回流量KB", COUNT(*) as "请求次数", round(avg(duration * 1000), 3) as "平均延迟(ms)" group by name, rcode order by "请求次数" desc
    • TOP 错误域名(非NXDOMAIN)图表所关联的查询分析语句如下所示:
      (* not  rcode: NXDOMAIN not rcode : NOERROR)| SELECT name as "域名", rcode as "状态码", round(sum(size/ 1024.0), 3) as "请求流量KB", round(sum(rsize / 1024.0),3 ) as  "返回流量KB", COUNT(*) as "请求次数", round(avg(duration * 1000), 3) as "平均延迟(ms)" where name  is not null group by name, rcode order by "请求次数" desc
    • TOP请求Pod图表所关联的查询分析语句如下所示:
      *| SELECT remote as "IP", arbitrary(name) as "请求域名采样", round(sum(size/ 1024.0), 3) as "请求流量KB", round(sum(rsize / 1024.0),3 ) as  "返回流量KB", COUNT(*) as "请求次数", round(avg(duration * 1000), 3) as "平均延迟(ms)" where remote  is not null group by remote order by "请求次数" desc
    • TOP NXDOMIN Pod图表所关联的查询分析语句如下所示:
      (* and rcode : NXDOMAIN)| SELECT remote as "IP", arbitrary(name) as "请求域名采样", round(sum(size/ 1024.0), 3) as "请求流量KB", round(sum(rsize / 1024.0),3 ) as  "返回流量KB", COUNT(*) as "请求次数", round(avg(duration * 1000), 3) as "平均延迟(ms)" where remote  is not null group by remote order by "请求次数" desc
    • TOP错误Pod(非NXDOMAIN)图表所关联的查询分析语句如下所示:
      (* not rcode : NXDOMAIN not rcode : NOERROR)| SELECT remote as "IP", arbitrary(name) as "请求域名采样", round(sum(size/ 1024.0), 3) as "请求流量KB", round(sum(rsize / 1024.0),3 ) as  "返回流量KB", COUNT(*) as "请求次数", round(avg(duration * 1000), 3) as "平均延迟(ms)" where remote  is not null group by remote order by "请求次数" desc
    • Timeout域名图表所关联的查询分析语句如下所示:
      (level : ERROR)| select name as "域名", count(1) as "次数" from log where errmsg like '%timeout%'  group by name order by "次数" desc limit 100
    • 近100条请求APIServier的错误图表所关联的查询分析语句如下所示:
      (content : reflector.go)| select date_format(from_unixtime(__time), '%Y-%m-%d %H:%i:%S') as "时间",  errmsg as "错误"  from log order by __time desc limit 100
    • 解析日志图表所关联的查询分析语句如下所示:
      (* and level: INFO and duration > 0.010)| SELECT name as "域名", round( duration * 1000.0, 3) as "延迟(ms)", remote as "请求者", port as "端口", type as "类型", rcode as "结果" from log order by duration desc  limit 100

相关文档