更新时间:2024-10-25 GMT+08:00
解析Nginx日志
Nginx访问日志记录了用户访问的详细信息,解析Nginx访问日志对业务运维具有重要意义。本文介绍如何使用正则表达式函数解析Nginx访问日志。
现以一条Nginx成功访问日志为例,介绍如何使用正则表达式解析Nginx成功访问日志。
- 原始日志
{"source":"192.168.0.1", "client_ip":"192.168.254.254", "receive_time":"1563443076", "content":"192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"" }
- 解析需求
- 需求1:从Nginx日志中提取出code、ip、datetime、protocol、request、sendbytes、referer、useragent、verb信息。
- 需求2:对request进行再提取,提取出uri_proto、uri_domain、uri_param信息。
- 需求3:对解析出来的uri_param进行再提取,提取出uri_path、uri_query信息。
- 加工规则
- 总编排
"""第一步:初步解析Nginx日志""" e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<refere>[\S]*)["] ["](?P<useragent>[\S\s]+)["]') """第二步:解析第一步得到的request""" e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)') """第三步:解析第二步得到的uri_param参数""" e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?]+)\?(?P<uri_query>[^?]+)')
- 总编排加工结果
{ "request": "http://example.test.com/_astats?application=&inf.name=eth0", "refere": "-", "uri_proto": "http", "code": 200, "ip": "192.168.0.2", "sendbytes": 273932, "receive_time": 1563443076, "verb": "GET", "useragent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)", "source": "192.168.0.1", "content": "192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"", "datetime": "04/Jan/2019:16:06:38 +0800", "protocol": "HTTP/1.1", "uri_path": "/_astats", "uri_query": "application=&inf.name=eth0", "uri_param": "/_astats?application=&inf.name=eth0", "client_ip": "192.168.254.254", "uri_domain": "example.test.com" }
- 细分编排及对应加工结果
- 针对需求1的加工编排如下:
e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<refere>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')
对应加工结果:
{ "request": "http://example.test.com/_astats?application=&inf.name=eth0", "refere": "-", "code": 200, "ip": "192.168.0.2", "sendbytes": 273932, "receive_time": 1563443076, "verb": "GET", "useragent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)", "source": "192.168.0.1", "content": "192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"", "datetime": "04/Jan/2019:16:06:38 +0800", "protocol": "HTTP/1.1", "client_ip": "192.168.254.254" }
- 针对需求2解析request,加工编排如下:
e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
对应加工结果:{ "uri_proto": "http", "uri_param": "/_astats?application=&inf.name=eth0", "uri_domain": "example.test.com" }
- 针对需求3解析uri_param,加工编排如下:
e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?]+)\?(?P<uri_query>[^?]+)')
对应加工结果:
{ "uri_path": "/_astats", "uri_query": "application=&inf.name=eth0", }
- 针对需求1的加工编排如下:
- 总编排
父主题: 文本解析