Updated on 2025-12-04 GMT+08:00

Processing Complex JSON Data

This section describes how to use LTS's data processing feature to process complex JSON data.

Processing Complex JSON Data with Multiple Sub-keys as Arrays

Logs generated by programs are written in JSON format with statistical information. Generally, the logs contain basic information and multiple sub-keys as arrays. For example, a server writes a log every minute, including the current information status and the statistical status of related servers and client nodes.

  • Example log
    {
        "content":{
         "service": "search_service",
         "overall_status": "yellow",
         "servers": [
             {
                 "host": "192.0.2.1",
                 "status": "green"
             },
             {
                 "host": "192.0.2.2",
                 "status": "green"
             }
         ],
         "clients": [
             {
                 "host": "192.0.2.3",
                 "status": "green"
             },
             {
                 "host": "192.0.2.4",
                 "status": "red"
             }
         ]
    }
    }
  • Processing requirements
    1. Split the original log into three topics: overall_type, client_status, and server_status.
    2. Save different information for different topics.
      • overall_type: retains the number of servers and clients, overall_status color, and service information.
      • client_status: retains the host address, status, and service information.
      • server_status: retains the host address, status, and service information.
  • Solution: The following describes how to use the processing syntax. The processing syntax in 1 to 7 needs to be used together.
    1. Split a log into three logs, assign three different values to the topics, and split the log. After the splitting, three logs with the same information except the topics are generated.
      e_set("topic", "server_status,client_status,overall_type")
      e_split("topic")

      The log format after processing is as follows:

      topic: server_status         // The other two logs are client_status and overall_type.
      content:  {
          ...Same as above...
      }
    2. The JSON content based on the content field is expanded at the first layer, and the content field is deleted.
      e_json('content',depth=1)
      e_drop_fields("content")

      The log format after processing is as follows:

      topic: overall_type              // The other two are client_status and overall_type. The other parameters are the same.
      clients:  [{"host": "192.0.2.3", "status": "green"}, {"host": "192.0.2.4", "status": "red"}]
      overall_status:  yellow
      servers:  [{"host": "192.0.2.1", "status": "green"}, {"host": "192.0.2.2", "status": "green"}]
      service:  search_service
    3. For logs whose topic is overall_type, collect statistics on client_count and server_count.
      e_if(e_search("topic==overall_type"), 
           e_compose(
              e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
              e_set("server_count", json_select(v("servers"), "length([*])", default=0))
        ))

      The log after processing is as follows:

      topic:  overall_type
      server_count:  2
      client_count:  2
    4. Discard relevant fields.
      e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))
    5. Further split the logs whose topic is server_status.
      e_if(e_search("topic==server_status"), e_split("servers"))
      e_if(e_search("topic==server_status"), e_json("servers", depth=1))

      The first log after processing is as follows:

      topic:  server_status
      servers:  {"host": "192.0.2.1", "status": "green"}
      host: 192.0.2.1
      status: green

      The second log after processing is as follows:

      topic:  server_status
      servers:  {"host": "192.0.2.2", "status": "green"}
      host: 192.0.2.2
      status: green
    6. Retain relevant fields:
      e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
    7. Further split the logs whose topic is client_status and delete unnecessary fields.
      e_if(e_search("topic==client_status"), e_split("clients"))
      e_if(e_search("topic==client_status"), e_json("clients", depth=1))

      The first log after processing is as follows:

      topic:  client_status
      host: 192.0.2.3
      status: green

      The second log after processing is as follows:

      topic:  clients
      host: 192.0.2.4
      status: red
    8. Combine the preceding syntax as follows:
      # Overall splitting
      e_set("topic", "server_status,client_status,overall_type")
      e_split("topic")
      e_json('content',depth=1)
      e_drop_fields("content")
      # Process overall_type logs.
      e_if(e_search("topic==overall_type"), 
           e_compose(
              e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
              e_set("server_count", json_select(v("servers"), "length([*])", default=0))
        ))
      e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))
      # Process server_status logs.
      e_if(e_search("topic==server_status"), e_split("servers"))
      e_if(e_search("topic==server_status"), e_json("servers", depth=1))
      e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
      # Process client_status logs.
      e_if(e_search("topic==client_status"), e_split("clients"))
      e_if(e_search("topic==client_status"), e_json("clients", depth=1))
      e_if(e_search("topic==client_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))

      Processing result:

      {
          "content":{
           "service": "search_service",
           "overall_status": "yellow",
           "servers": [
               {
                   "host": "192.0.2.1",
                   "status": "green"
               },
               {
                   "host": "192.0.2.2",
                   "status": "green"
               }
           ],
           "clients": [
               {
                   "host": "192.0.2.3",
                   "status": "green"
               },
               {
                   "host": "192.0.2.4",
                   "status": "red"
               }
           ]
      }
      }

Processing Complex JSON Data with Multi-Layer Array Object Nesting

Take a complex JSON object with multi-layer array nesting as an example. Split each login information in login_histories of each object under users into a login event.

  • Raw log
    {
    "content":{
      "users": [
        {
            "name": "user1",
            "login_histories": [
              {
                "date": "2019-10-10 0:0:0",
                "login_ip": "192.0.2.6"
              },
              {
                "date": "2019-10-10 1:0:0",
                "login_ip": "192.0.2.6"
              },
          {
          ...More login information...
          }
           ]
        },
        {
            "name": "user2",
            "login_histories": [
              {
                "date": "2019-10-11 0:0:0",
                "login_ip": "192.0.2.7"
              },
              {
                "date": "2019-10-11 1:0:0",
                "login_ip": "192.0.2.9"
              },
          {
          ...More login information...
          }     
            ]
        },
      {
        ...More users...
      }
      ]
    }
    }
  • Expected split log
    name:  user1
    date:  2019-10-11 1:0:0
    login_ip:  192.0.2.6
    
    name:  user1
    date:  2019-10-11 0:0:0
    login_ip:  192.0.2.6
    
    name:  user2
    date:  2019-10-11 0:0:0
    login_ip:  192.0.2.7
    
    name:  user2
    date:  2019-10-11 1:0:0
    login_ip:  192.0.2.9  
    
    ...More logs...
  • Solution
    1. Split and expand users in content.
      e_split("content", jmes='users[*]', output='item')
      e_json("item",depth=1)
      Logs returned after processing:
      content:{...Same as above...}
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]
      name:  user1
      
      content:{...Same as above...}
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]
      name:  user2
    2. Split and then expand login_histories.
      e_split("login_histories")
      e_json("login_histories", depth=1)

      Logs returned after processing:

      content: {...Same as above...}
      date:  2019-10-11 0:0:0
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  {"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}
      login_ip:  192.0.2.7
      name:  user2
      
      content: {...Same as above...}
      date:  2019-10-11 1:0:0
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}
      login_ip:  192.0.2.9
      name:  user2
      
      content: {...Same as above...}
      date:  2019-10-10 1:0:0
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}
      login_ip:  192.0.2.6
      name:  user1
      
      content: {...Same as above...}
      date:  2019-10-10 0:0:0
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  {"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}
      login_ip:  192.0.2.6
      name:  user1
    3. Delete irrelevant fields.
      e_drop_fields("content", "item", "login_histories")

      Logs returned after processing:

      {
      	"date": "2019-10-10 0:0:0",
      	"name": "user1",
      	"login_ip": "192.0.2.6"
      }
      {
      	"date": "2019-10-10 1:0:0",
      	"name": "user1",
      	"login_ip": "192.0.2.6"
      }
      {
      	"date": "2019-10-11 0:0:0",
      	"name": "user2",
      	"login_ip": "192.0.2.7"
      }
      {
      	"date": "2019-10-11 1:0:0",
      	"name": "user2",
      	"login_ip": "192.0.2.9"
      }
    4. The DSL rules are as follows:
      e_split("content", jmes='users[*]', output='item')
      e_json("item",depth=1)
      e_split("login_histories")
      e_json("login_histories", depth=1)
      e_drop_fields("content", "item", "login_histories")

      Summary: For the preceding requirements, split logs, expand logs, and delete irrelevant information.