更新时间:2024-10-25 GMT+08:00
分享

解析Nginx日志

Nginx访问日志记录了用户访问的详细信息,解析Nginx访问日志对业务运维具有重要意义。本文介绍如何使用正则表达式函数解析Nginx访问日志。

现以一条Nginx成功访问日志为例,介绍如何使用正则表达式解析Nginx成功访问日志。

  • 原始日志
    {"source":"192.168.0.1",
    "client_ip":"192.168.254.254",
    "receive_time":"1563443076",
    "content":"192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\""
    }
  • 解析需求
    • 需求1:从Nginx日志中提取出code、ip、datetime、protocol、request、sendbytes、referer、useragent、verb信息。
    • 需求2:对request进行再提取,提取出uri_proto、uri_domain、uri_param信息。
    • 需求3:对解析出来的uri_param进行再提取,提取出uri_path、uri_query信息。
  • 加工规则
    • 总编排
      """第一步:初步解析Nginx日志"""
      e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<refere>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')
      """第二步:解析第一步得到的request"""
      e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
      """第三步:解析第二步得到的uri_param参数"""
      e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?]+)\?(?P<uri_query>[^?]+)')
    • 总编排加工结果
      {
      	"request": "http://example.test.com/_astats?application=&inf.name=eth0",
      	"refere": "-",
      	"uri_proto": "http",
      	"code": 200,
      	"ip": "192.168.0.2",
      	"sendbytes": 273932,
      	"receive_time": 1563443076,
      	"verb": "GET",
      	"useragent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)",
      	"source": "192.168.0.1",
      	"content": "192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"",
      	"datetime": "04/Jan/2019:16:06:38 +0800",
      	"protocol": "HTTP/1.1",
      	"uri_path": "/_astats",
      	"uri_query": "application=&inf.name=eth0",
      	"uri_param": "/_astats?application=&inf.name=eth0",
      	"client_ip": "192.168.254.254",
      	"uri_domain": "example.test.com"
      }
    • 细分编排及对应加工结果
      • 针对需求1的加工编排如下:
        e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<refere>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')

        对应加工结果:

        {
        	"request": "http://example.test.com/_astats?application=&inf.name=eth0",
        	"refere": "-",
        	"code": 200,
        	"ip": "192.168.0.2",
        	"sendbytes": 273932,
        	"receive_time": 1563443076,
        	"verb": "GET",
        	"useragent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)",
        	"source": "192.168.0.1",
        	"content": "192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"",
        	"datetime": "04/Jan/2019:16:06:38 +0800",
        	"protocol": "HTTP/1.1",
        	"client_ip": "192.168.254.254"
        }
      • 针对需求2解析request,加工编排如下:
        e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
        对应加工结果:
        {
        	"uri_proto": "http",
        	"uri_param": "/_astats?application=&inf.name=eth0",
        	"uri_domain": "example.test.com"
        }
      • 针对需求3解析uri_param,加工编排如下:
        e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?]+)\?(?P<uri_query>[^?]+)')

        对应加工结果:

        {
        	"uri_path": "/_astats",
        	"uri_query": "application=&inf.name=eth0",
        }

相关文档