Help Center/ Log Tank Service/ Best Practices/ Log Jobs (Beta)/ Parsing Text/ Processing Complex JSON Data

Updated on 2025-12-04 GMT+08:00

View PDF

Processing Complex JSON Data

This section describes how to use LTS's data processing feature to process complex JSON data.

Processing Complex JSON Data with Multiple Sub-keys as Arrays

Logs generated by programs are written in JSON format with statistical information. Generally, the logs contain basic information and multiple sub-keys as arrays. For example, a server writes a log every minute, including the current information status and the statistical status of related servers and client nodes.

Example log

{
    "content":{
     "service": "search_service",
     "overall_status": "yellow",
     "servers": [
         {
             "host": "192.0.2.1",
             "status": "green"
         },
         {
             "host": "192.0.2.2",
             "status": "green"
         }
     ],
     "clients": [
         {
             "host": "192.0.2.3",
             "status": "green"
         },
         {
             "host": "192.0.2.4",
             "status": "red"
         }
     ]
}
}

Processing requirements
1. Split the original log into three topics: overall_type, client_status, and server_status.
2. Save different information for different topics.
  - overall_type: retains the number of servers and clients, overall_status color, and service information.
  - client_status: retains the host address, status, and service information.
  - server_status: retains the host address, status, and service information.

Solution: The following describes how to use the processing syntax. The processing syntax in 1 to 7 needs to be used together.

Split a log into three logs, assign three different values to the topics, and split the log. After the splitting, three logs with the same information except the topics are generated.
```
e_set("topic", "server_status,client_status,overall_type")
e_split("topic")
```
The log format after processing is as follows:
```
topic: server_status         // The other two logs are client_status and overall_type.
content:  {
    ...Same as above...
}
```

The JSON content based on the content field is expanded at the first layer, and the content field is deleted.

e_json('content',depth=1)
e_drop_fields("content")

The log format after processing is as follows:

topic: overall_type              // The other two are client_status and overall_type. The other parameters are the same.
clients:  [{"host": "192.0.2.3", "status": "green"}, {"host": "192.0.2.4", "status": "red"}]
overall_status:  yellow
servers:  [{"host": "192.0.2.1", "status": "green"}, {"host": "192.0.2.2", "status": "green"}]
service:  search_service

For logs whose topic is overall_type, collect statistics on client_count and server_count.

e_if(e_search("topic==overall_type"), 
     e_compose(
        e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
        e_set("server_count", json_select(v("servers"), "length([*])", default=0))
  ))

The log after processing is as follows:

topic:  overall_type
server_count:  2
client_count:  2

Discard relevant fields.

e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))

Further split the logs whose topic is server_status.

e_if(e_search("topic==server_status"), e_split("servers"))
e_if(e_search("topic==server_status"), e_json("servers", depth=1))

The first log after processing is as follows:

topic:  server_status
servers:  {"host": "192.0.2.1", "status": "green"}
host: 192.0.2.1
status: green

The second log after processing is as follows:

topic:  server_status
servers:  {"host": "192.0.2.2", "status": "green"}
host: 192.0.2.2
status: green

Retain relevant fields:

e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))

Further split the logs whose topic is client_status and delete unnecessary fields.

e_if(e_search("topic==client_status"), e_split("clients"))
e_if(e_search("topic==client_status"), e_json("clients", depth=1))

The first log after processing is as follows:

topic:  client_status
host: 192.0.2.3
status: green

The second log after processing is as follows:

topic:  clients
host: 192.0.2.4
status: red

Combine the preceding syntax as follows:

# Overall splitting
e_set("topic", "server_status,client_status,overall_type")
e_split("topic")
e_json('content',depth=1)
e_drop_fields("content")
# Process overall_type logs.
e_if(e_search("topic==overall_type"), 
     e_compose(
        e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
        e_set("server_count", json_select(v("servers"), "length([*])", default=0))
  ))
e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))
# Process server_status logs.
e_if(e_search("topic==server_status"), e_split("servers"))
e_if(e_search("topic==server_status"), e_json("servers", depth=1))
e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
# Process client_status logs.
e_if(e_search("topic==client_status"), e_split("clients"))
e_if(e_search("topic==client_status"), e_json("clients", depth=1))
e_if(e_search("topic==client_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))

Processing result:

{
    "content":{
     "service": "search_service",
     "overall_status": "yellow",
     "servers": [
         {
             "host": "192.0.2.1",
             "status": "green"
         },
         {
             "host": "192.0.2.2",
             "status": "green"
         }
     ],
     "clients": [
         {
             "host": "192.0.2.3",
             "status": "green"
         },
         {
             "host": "192.0.2.4",
             "status": "red"
         }
     ]
}
}

Processing Complex JSON Data with Multi-Layer Array Object Nesting

Take a complex JSON object with multi-layer array nesting as an example. Split each login information in login_histories of each object under users into a login event.

Raw log

{
"content":{
  "users": [
    {
        "name": "user1",
        "login_histories": [
          {
            "date": "2019-10-10 0:0:0",
            "login_ip": "192.0.2.6"
          },
          {
            "date": "2019-10-10 1:0:0",
            "login_ip": "192.0.2.6"
          },
      {
      ...More login information...
      }
       ]
    },
    {
        "name": "user2",
        "login_histories": [
          {
            "date": "2019-10-11 0:0:0",
            "login_ip": "192.0.2.7"
          },
          {
            "date": "2019-10-11 1:0:0",
            "login_ip": "192.0.2.9"
          },
      {
      ...More login information...
      }     
        ]
    },
  {
    ...More users...
  }
  ]
}
}

Expected split log

name:  user1
date:  2019-10-11 1:0:0
login_ip:  192.0.2.6

name:  user1
date:  2019-10-11 0:0:0
login_ip:  192.0.2.6

name:  user2
date:  2019-10-11 0:0:0
login_ip:  192.0.2.7

name:  user2
date:  2019-10-11 1:0:0
login_ip:  192.0.2.9  

...More logs...

Solution

Split and expand users in content.

e_split("content", jmes='users[*]', output='item')
e_json("item",depth=1)

Logs returned after processing:

content:{...Same as above...}
item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
login_histories:  [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]
name:  user1

content:{...Same as above...}
item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
login_histories:  [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]
name:  user2

Split and then expand login_histories.

e_split("login_histories")
e_json("login_histories", depth=1)

Logs returned after processing:

content: {...Same as above...}
date:  2019-10-11 0:0:0
item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
login_histories:  {"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}
login_ip:  192.0.2.7
name:  user2

content: {...Same as above...}
date:  2019-10-11 1:0:0
item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
login_histories:  {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}
login_ip:  192.0.2.9
name:  user2

content: {...Same as above...}
date:  2019-10-10 1:0:0
item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
login_histories:  {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}
login_ip:  192.0.2.6
name:  user1

content: {...Same as above...}
date:  2019-10-10 0:0:0
item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
login_histories:  {"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}
login_ip:  192.0.2.6
name:  user1

Delete irrelevant fields.

e_drop_fields("content", "item", "login_histories")

Logs returned after processing:

{
	"date": "2019-10-10 0:0:0",
	"name": "user1",
	"login_ip": "192.0.2.6"
}
{
	"date": "2019-10-10 1:0:0",
	"name": "user1",
	"login_ip": "192.0.2.6"
}
{
	"date": "2019-10-11 0:0:0",
	"name": "user2",
	"login_ip": "192.0.2.7"
}
{
	"date": "2019-10-11 1:0:0",
	"name": "user2",
	"login_ip": "192.0.2.9"
}

The DSL rules are as follows:

e_split("content", jmes='users[*]', output='item')
e_json("item",depth=1)
e_split("login_histories")
e_json("login_histories", depth=1)
e_drop_fields("content", "item", "login_histories")

Summary: For the preceding requirements, split logs, expand logs, and delete irrelevant information.

Parent topic: Parsing Text

Previous topic: Parsing CSV Logs

Next topic: Parsing and Updating JSON Data