更新时间:2024-05-11 GMT+08:00
分享

CCE日志K8s事件中心

CCE日志K8s事件中心仪表盘主要展示节点FD不足、节点磁盘空间不足、事件同步异常、事件分布等。

前提条件

背景信息

云容器引擎(Cloud Container Engine,简称CCE)提供高度可扩展的、高性能的企业级Kubernetes集群。借助云容器引擎,您可以在华为云上轻松部署、管理和扩展容器化应用程序。

分析网站访问情况

  1. 登录云日志服务控制台。
  2. 在左侧导航栏中选择“仪表盘 ”。
  3. 在仪表盘模板下方,选择“CCE日志K8s事件中心”仪表盘,查看图表详情。

CCE日志K8s事件中心仪表盘中的过滤器说明如下所示:

  • 事件等级分为Warning和Normal。
  • 事件类型,所关联的查询分析语句如下所示:
    select distinct("name")
  • 集群ID,所关联的查询分析语句如下所示:
    select distinct("cluster_id")
  • 命名空间,所关联的查询分析语句如下所示:
    select distinct("namespace") 
  • 名称,所关联的查询分析语句如下所示:
    select distinct("resource_name")

重要图表说明

CCE日志K8s事件中心仪表盘中重要图表说明如下所示:

  • Conntrack Full,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'ConntrackFull'  ) )
  • 事件同步异常,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'NTPIsDown') )
  • 节点Pid不足,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name" in ('PIDPressure','NodeHasPIDPressure') ) )
  • 节点FD不足,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'NodeHasFDPressure') )
  • 节点磁盘空间不足,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'NodeHasDiskPressure') )
  • Pod OOM,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where   "reason" in ('OOMKilling','PodOOMKilling')) )
  • DockerHung,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'Failed' and "reason" = 'DockerHung') )
  • 节点重启,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'NodeRebooted') )
  • 镜像拉取失败,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'Failed' and "reason" = 'ImagePullBackOff') )
  • 节点OOM,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name" = 'SystemOOM') )
  • Pod启动失败,所关联的查询分析语句如下所示:
    select diff[1] as "total", round((diff[1] - diff[2]) / diff[2] * 100 , 2 ) as "inc" from (select compare( "total", 3600) as diff from( select count(1) as "total" from log where  "name"= 'Failed' and "resource_kind" = 'Pod' and  "reason" = 'ImagePullBackOff') )
  • 事件分布,所关联的查询分析语句如下所示:
    select "type", count(*) as "事件数" group by "type"
  • Warning事件趋势,所关联的查询分析语句如下所示:
    select time_series(__time, 'PT1H', 'yyyy-MM-dd HH', '0') as "dt",count(1) as "count"  from log  where "type" = 'Warning'  group by "dt" order by "dt"
  • Error事件趋势,所关联的查询分析语句如下所示:
    select time_series(__time, 'PT1H', 'yyyy-MM-dd HH', '0') as "dt",count(1) as "count" from log  where "type" = 'Error' group by "dt" order by "dt"
  • Pod OOM事件列表,所关联的查询分析语句如下所示:
    select TIME_FORMAT( __time, 'yyyy-MM-dd HH:mm:ss', '+08:00') as "Time", "resource_kind" as "事件目标", "name" as "类型", "resource_name" as "目标名", "reason" as "详细内容" from log where "name" in ('OOMKilling','PodOOMKilling') order by __time desc limit 100
  • Pod驱动事件列表,所关联的查询分析语句如下所示:
    select TIME_FORMAT( __time, 'yyyy-MM-dd HH:mm:ss', '+08:00' ) as "Time", "resource_kind" as "事件目标", "name" as "类型", "resource_name" as "目标名", "reason" as "详细内容" from log where "name" = 'NodeControllerEviction' order by __time desc limit 100
  • 重要事件列表,所关联的查询分析语句如下所示:
    select TIME_FORMAT( __time, 'yyyy-MM-dd HH:mm:ss', '+08:00' ) as "Time", "type" as "等级", "resource_kind" as "事件目标", "name" as "类型", "resource_name" as "目标名", "reason" as "详细内容" from log where "type" in ('Warning','Error') order by __time desc limit 100
分享:

    相关文档

    相关产品