Updated on 2025-12-04 GMT+08:00

Parsing Nginx Logs

Nginx access logs record detailed information about user access. Parsing them is of great significance to service O&M. This section describes how to use regular expression functions to parse Nginx access logs.

The following example demonstrates using regular expressions to parse a successful Nginx access log.

  • Raw log
    {"source":"192.168.0.1",
    "client_ip":"192.168.254.254",
    "receive_time":"1563443076",
    "content":"192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\""
    }
  • Parsing requirements
    • Requirement 1: Extract the code, ip, datetime, protocol, request, sendbytes, referer, useragent, and verb information from Nginx logs.
    • Requirement 2: Extract the uri_proto, uri_domain, and uri_param information from request.
    • Requirement 3: Extract the uri_path and uri_query information from the parsed uri_param.
  • Processing rule
    • General orchestration
      """Step 1: Preliminarily parse Nginx logs."""
      e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<referer>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')
      """Step 2: Parse request obtained in step 1."""
      e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
      """Step 3: Parse the uri_param parameter obtained in step 2.""
      e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?]+)\?(?P<uri_query>[^?]+)')
    • Processing result of the overall orchestration
      {
      	"request": "http://example.test.com/_astats?application=&inf.name=eth0",
      	"referer": "-",
      	"uri_proto": "http",
      	"code": 200,
      	"ip": "192.168.0.2",
      	"sendbytes": 273932,
      	"receive_time": 1563443076,
      	"verb": "GET",
      	"useragent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)",
      	"source": "192.168.0.1",
      	"content": "192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"",
      	"datetime": "04/Jan/2019:16:06:38 +0800",
      	"protocol": "HTTP/1.1",
      	"uri_path": "/_astats",
      	"uri_query": "application=&inf.name=eth0",
      	"uri_param": "/_astats?application=&inf.name=eth0",
      	"client_ip": "192.168.254.254",
      	"uri_domain": "example.test.com"
      }
    • Segmented orchestration and processing result
      • The processing orchestration for requirement 1 is as follows:
        e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<referer>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')

        Processing result:

        {
        	"request": "http://example.test.com/_astats?application=&inf.name=eth0",
        	"referer": "-",
        	"code": 200,
        	"ip": "192.168.0.2",
        	"sendbytes": 273932,
        	"receive_time": 1563443076,
        	"verb": "GET",
        	"useragent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)",
        	"source": "192.168.0.1",
        	"content": "192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] \"GET http://example.test.com/_astats?application=&inf.name=eth0 HTTP/1.1\" 200 273932 \"-\" \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)\"",
        	"datetime": "04/Jan/2019:16:06:38 +0800",
        	"protocol": "HTTP/1.1",
        	"client_ip": "192.168.254.254"
        }
      • Parse request for requirement 2. The processing orchestration is as follows:
        e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
        Processing result:
        {
        	"uri_proto": "http",
        	"uri_param": "/_astats?application=&inf.name=eth0",
        	"uri_domain": "example.test.com"
        }
      • Parse uri_param for requirement 3. The processing orchestration is as follows:
        e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?]+)\?(?P<uri_query>[^?]+)')

        Processing result:

        {
        	"uri_path": "/_astats",
        	"uri_query": "application=&inf.name=eth0",
        }