更新时间:2024-11-07 GMT+08:00
分享

加工复杂JSON数据

本文档主要为您介绍如何使用云日志服务数据加工功能对复杂的JSON数据进行加工。

多子键为数组的复杂JSON数据加工

程序构建的日志会以一种统计性质的JSON格式写入,通常包含一个基础信息以及多个子键为数组的数据形式。例如一个服务器每隔1分钟写入一条日志,包含当前信息状态,以及相关服务器和客户端节点的统计状态信息。

  • 日志样例
    {
        "content":{
         "service": "search_service",
         "overal_status": "yellow",
         "servers": [
             {
                 "host": "192.0.2.1",
                 "status": "green"
             },
             {
                 "host": "192.0.2.2",
                 "status": "green"
             }
         ],
         "clients": [
             {
                 "host": "192.0.2.3",
                 "status": "green"
             },
             {
                 "host": "192.0.2.4",
                 "status": "red"
             }
         ]
    }
    }
  • 加工需求
    1. 对原始日志进行topic分裂,分别是overall_type、client_status、server_status。
    2. 对不同的topic保存不同的信息。
      • overall_type:保留server、client数量、overall_status颜色和service信息。
      • client_status:保留host地址、status状态和service信息。
      • server_status:保留host地址、status状态和service信息。
  • 解决方案:接下来分别介绍加工语法的使用方法,以下1-7的加工语法需要综合使用。
    1. 将一条日志拆分成三条日志,给主题赋予三个不同值再进行分裂,经过分裂后会分成除topic不同,其他信息相同的三条日志。
      e_set("topic", "server_status,client_status,overall_type")
      e_split("topic")

      处理后日志格式如下:

      topic:  server_status         // 另外2条是client_status和overall_type, 其他一样
      content:  {
          ...如上...
      }
    2. 基于content的JSON内容在第一层展开,并删除content字段。
      e_json('content',depth=1)
      e_drop_fields("content")

      处理后的日志格式如下:

      topic:  overall_type              // 另外2条是client_status和overall_type, 其他一样
      clients:  [{"host": "192.0.2.3", "status": "green"}, {"host": "192.0.2.4", "status": "red"}]
      overal_status:  yellow
      servers:  [{"host": "192.0.2.1", "status": "green"}, {"host": "192.0.2.2", "status": "green"}]
      service:  search_service
    3. 对主题是overall_type的日志,统计client_count和server_count。
      e_if(e_search("topic==overall_type"), 
           e_compose(
              e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
              e_set("server_count", json_select(v("servers"), "length([*])", default=0))
        ))

      处理后的日志为:

      topic:  overall_type
      server_count:  2
      client_count:  2
    4. 丢弃相关字段:
      e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))
    5. 对主题是server_status的日志,进行进一步分裂。
      e_if(e_search("topic==server_status"), e_split("servers"))
      e_if(e_search("topic==server_status"), e_json("servers", depth=1))

      处理后的第一条日志参考如下:

      topic:  server_status
      servers:  {"host": "192.0.2.1", "status": "green"}
      host: 192.0.2.1
      status: green

      处理后的第二条日志参考如下:

      topic:  server_status
      servers:  {"host": "192.0.2.2", "status": "green"}
      host: 192.0.2.2
      status: green
    6. 保留相关字段:
      e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
    7. 对主题是client_status的日志进行进一步分裂,再删除多余字段。
      e_if(e_search("topic==client_status"), e_split("clients"))
      e_if(e_search("topic==client_status"), e_json("clients", depth=1))

      处理后的第一条日志参考如下:

      topic:  client_status
      host: 192.0.2.3
      status: green

      处理后的第二条日志参考如下:

      topic:  clients
      host: 192.0.2.4
      status: red
    8. 将以上语法综合后,参考如下:
      #总体分裂
      e_set("topic", "server_status,client_status,overall_type")
      e_split("topic")
      e_json('content',depth=1)
      e_drop_fields("content")
      # 处理overall_type日志
      e_if(e_search("topic==overall_type"), 
           e_compose(
              e_set("client_count", json_select(v("clients"), "length([*])", default=0)), 
              e_set("server_count", json_select(v("servers"), "length([*])", default=0))
        ))
      e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))
      # 处理server_status日志
      e_if(e_search("topic==server_status"), e_split("servers"))
      e_if(e_search("topic==server_status"), e_json("servers", depth=1))
      e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
      # 处理client_status日志
      e_if(e_search("topic==client_status"), e_split("clients"))
      e_if(e_search("topic==client_status"), e_json("clients", depth=1))
      e_if(e_search("topic==client_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))

      执行后输出日志如下:

      {
          "content":{
           "service": "search_service",
           "overal_status": "yellow",
           "servers": [
               {
                   "host": "192.0.2.1",
                   "status": "green"
               },
               {
                   "host": "192.0.2.2",
                   "status": "green"
               }
           ],
           "clients": [
               {
                   "host": "192.0.2.3",
                   "status": "green"
               },
               {
                   "host": "192.0.2.4",
                   "status": "red"
               }
           ]
      }
      }

多层数组对象嵌套的复杂JSON数据加工

以一个复杂的保护多层数组嵌套的对象为示例,将users下的每个对象中的login_histories的每个登录信息都拆成一个登录事件。

  • 原始日志
    {
    "content":{
      "users": [
        {
            "name": "user1",
            "login_histories": [
              {
                "date": "2019-10-10 0:0:0",
                "login_ip": "192.0.2.6"
              },
              {
                "date": "2019-10-10 1:0:0",
                "login_ip": "192.0.2.6"
              },
          {
          ...更多登录信息...
          }
           ]
        },
        {
            "name": "user2",
            "login_histories": [
              {
                "date": "2019-10-11 0:0:0",
                "login_ip": "192.0.2.7"
              },
              {
                "date": "2019-10-11 1:0:0",
                "login_ip": "192.0.2.9"
              },
          {
          ...更多登录信息...
          }     
            ]
        },
      {
        ....更多user....
      }
      ]
    }
    }
  • 期望分裂出的日志
    name:  user1
    date:  2019-10-11 1:0:0
    login_ip:  192.0.2.6
    
    name:  user1
    date:  2019-10-11 0:0:0
    login_ip:  192.0.2.6
    
    name:  user2
    date:  2019-10-11 0:0:0
    login_ip:  192.0.2.7
    
    name:  user2
    date:  2019-10-11 1:0:0
    login_ip:  192.0.2.9  
    
    ....更多日志....
  • 解决方案
    1. 对content中的users进行分裂和展开操作。
      e_split("content", jmes='users[*]', output='item')
      e_json("item",depth=1)
      处理后返回的日志:
      content:{...如前...}
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]
      name:  user1
      
      content:{...如前...}
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]
      name:  user2
    2. 对login_histories先分裂再展开。
      e_split("login_histories")
      e_json("login_histories", depth=1)

      处理后返回的日志:

      content: {...如前...}
      date:  2019-10-11 0:0:0
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  {"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}
      login_ip:  192.0.2.7
      name:  user2
      
      content: {...如前...}
      date:  2019-10-11 1:0:0
      item:  {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]}
      login_histories:  {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}
      login_ip:  192.0.2.9
      name:  user2
      
      content: {...如前...}
      date:  2019-10-10 1:0:0
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}
      login_ip:  192.0.2.6
      name:  user1
      
      content: {...如前...}
      date:  2019-10-10 0:0:0
      item:  {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]}
      login_histories:  {"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}
      login_ip:  192.0.2.6
      name:  user1
    3. 删除无关字段。
      e_drop_fields("content", "item", "login_histories")

      处理后返回的日志:

      {
      	"date": "2019-10-10 0:0:0",
      	"name": "user1",
      	"login_ip": "192.0.2.6"
      }
      {
      	"date": "2019-10-10 1:0:0",
      	"name": "user1",
      	"login_ip": "192.0.2.6"
      }
      {
      	"date": "2019-10-11 0:0:0",
      	"name": "user2",
      	"login_ip": "192.0.2.7"
      }
      {
      	"date": "2019-10-11 1:0:0",
      	"name": "user2",
      	"login_ip": "192.0.2.9"
      }
    4. 综上DSL规则参考如下:
      e_split("content", jmes='users[*]', output='item')
      e_json("item",depth=1)
      e_split("login_histories")
      e_json("login_histories", depth=1)
      e_drop_fields("content", "item", "login_histories")

      总结:针对以上类似的需求,首先进行分裂,然后再做展开操作,最后删除无关信息。

相关文档