加工复杂JSON数据
本文档主要为您介绍如何使用云日志服务数据加工功能对复杂的JSON数据进行加工。
多子键为数组的复杂JSON数据加工
程序构建的日志会以一种统计性质的JSON格式写入,通常包含一个基础信息以及多个子键为数组的数据形式。例如一个服务器每隔1分钟写入一条日志,包含当前信息状态,以及相关服务器和客户端节点的统计状态信息。
- 日志样例
{ "content":{ "service": "search_service", "overal_status": "yellow", "servers": [ { "host": "192.0.2.1", "status": "green" }, { "host": "192.0.2.2", "status": "green" } ], "clients": [ { "host": "192.0.2.3", "status": "green" }, { "host": "192.0.2.4", "status": "red" } ] } }
- 加工需求
- 对原始日志进行topic分裂,分别是overall_type、client_status、server_status。
- 对不同的topic保存不同的信息。
- overall_type:保留server、client数量、overall_status颜色和service信息。
- client_status:保留host地址、status状态和service信息。
- server_status:保留host地址、status状态和service信息。
- 解决方案:接下来分别介绍加工语法的使用方法,以下1-7的加工语法需要综合使用。
- 将一条日志拆分成三条日志,给主题赋予三个不同值再进行分裂,经过分裂后会分成除topic不同,其他信息相同的三条日志。
e_set("topic", "server_status,client_status,overall_type") e_split("topic")
处理后日志格式如下:
topic: server_status // 另外2条是client_status和overall_type, 其他一样 content: { ...如上... }
- 基于content的JSON内容在第一层展开,并删除content字段。
e_json('content',depth=1) e_drop_fields("content")
处理后的日志格式如下:
topic: overall_type // 另外2条是client_status和overall_type, 其他一样 clients: [{"host": "192.0.2.3", "status": "green"}, {"host": "192.0.2.4", "status": "red"}] overal_status: yellow servers: [{"host": "192.0.2.1", "status": "green"}, {"host": "192.0.2.2", "status": "green"}] service: search_service
- 对主题是overall_type的日志,统计client_count和server_count。
e_if(e_search("topic==overall_type"), e_compose( e_set("client_count", json_select(v("clients"), "length([*])", default=0)), e_set("server_count", json_select(v("servers"), "length([*])", default=0)) ))
处理后的日志为:
topic: overall_type server_count: 2 client_count: 2
- 丢弃相关字段:
e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers"))
- 对主题是server_status的日志,进行进一步分裂。
e_if(e_search("topic==server_status"), e_split("servers")) e_if(e_search("topic==server_status"), e_json("servers", depth=1))
处理后的第一条日志参考如下:
topic: server_status servers: {"host": "192.0.2.1", "status": "green"} host: 192.0.2.1 status: green
处理后的第二条日志参考如下:
topic: server_status servers: {"host": "192.0.2.2", "status": "green"} host: 192.0.2.2 status: green
- 保留相关字段:
e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
- 对主题是client_status的日志进行进一步分裂,再删除多余字段。
e_if(e_search("topic==client_status"), e_split("clients")) e_if(e_search("topic==client_status"), e_json("clients", depth=1))
处理后的第一条日志参考如下:
topic: client_status host: 192.0.2.3 status: green
处理后的第二条日志参考如下:
topic: clients host: 192.0.2.4 status: red
- 将以上语法综合后,参考如下:
#总体分裂 e_set("topic", "server_status,client_status,overall_type") e_split("topic") e_json('content',depth=1) e_drop_fields("content") # 处理overall_type日志 e_if(e_search("topic==overall_type"), e_compose( e_set("client_count", json_select(v("clients"), "length([*])", default=0)), e_set("server_count", json_select(v("servers"), "length([*])", default=0)) )) e_if(e_search("topic==overall_type"), e_drop_fields("clients", "servers")) # 处理server_status日志 e_if(e_search("topic==server_status"), e_split("servers")) e_if(e_search("topic==server_status"), e_json("servers", depth=1)) e_if(e_search("topic==server_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients"))) # 处理client_status日志 e_if(e_search("topic==client_status"), e_split("clients")) e_if(e_search("topic==client_status"), e_json("clients", depth=1)) e_if(e_search("topic==client_status"), e_compose(e_drop_fields("servers"),e_drop_fields("clients")))
执行后输出日志如下:
{ "content":{ "service": "search_service", "overal_status": "yellow", "servers": [ { "host": "192.0.2.1", "status": "green" }, { "host": "192.0.2.2", "status": "green" } ], "clients": [ { "host": "192.0.2.3", "status": "green" }, { "host": "192.0.2.4", "status": "red" } ] } }
- 将一条日志拆分成三条日志,给主题赋予三个不同值再进行分裂,经过分裂后会分成除topic不同,其他信息相同的三条日志。
多层数组对象嵌套的复杂JSON数据加工
以一个复杂的保护多层数组嵌套的对象为示例,将users下的每个对象中的login_histories的每个登录信息都拆成一个登录事件。
- 原始日志
{ "content":{ "users": [ { "name": "user1", "login_histories": [ { "date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6" }, { "date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6" }, { ...更多登录信息... } ] }, { "name": "user2", "login_histories": [ { "date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7" }, { "date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9" }, { ...更多登录信息... } ] }, { ....更多user.... } ] } }
- 期望分裂出的日志
name: user1 date: 2019-10-11 1:0:0 login_ip: 192.0.2.6 name: user1 date: 2019-10-11 0:0:0 login_ip: 192.0.2.6 name: user2 date: 2019-10-11 0:0:0 login_ip: 192.0.2.7 name: user2 date: 2019-10-11 1:0:0 login_ip: 192.0.2.9 ....更多日志....
- 解决方案
- 对content中的users进行分裂和展开操作。
e_split("content", jmes='users[*]', output='item') e_json("item",depth=1)
处理后返回的日志:content:{...如前...} item: {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]} login_histories: [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}] name: user1 content:{...如前...} item: {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]} login_histories: [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}] name: user2
- 对login_histories先分裂再展开。
e_split("login_histories") e_json("login_histories", depth=1)
处理后返回的日志:
content: {...如前...} date: 2019-10-11 0:0:0 item: {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]} login_histories: {"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"} login_ip: 192.0.2.7 name: user2 content: {...如前...} date: 2019-10-11 1:0:0 item: {"name": "user2", "login_histories": [{"date": "2019-10-11 0:0:0", "login_ip": "192.0.2.7"}, {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"}]} login_histories: {"date": "2019-10-11 1:0:0", "login_ip": "192.0.2.9"} login_ip: 192.0.2.9 name: user2 content: {...如前...} date: 2019-10-10 1:0:0 item: {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]} login_histories: {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"} login_ip: 192.0.2.6 name: user1 content: {...如前...} date: 2019-10-10 0:0:0 item: {"name": "user1", "login_histories": [{"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"}, {"date": "2019-10-10 1:0:0", "login_ip": "192.0.2.6"}]} login_histories: {"date": "2019-10-10 0:0:0", "login_ip": "192.0.2.6"} login_ip: 192.0.2.6 name: user1
- 删除无关字段。
e_drop_fields("content", "item", "login_histories")
处理后返回的日志:
{ "date": "2019-10-10 0:0:0", "name": "user1", "login_ip": "192.0.2.6" } { "date": "2019-10-10 1:0:0", "name": "user1", "login_ip": "192.0.2.6" } { "date": "2019-10-11 0:0:0", "name": "user2", "login_ip": "192.0.2.7" } { "date": "2019-10-11 1:0:0", "name": "user2", "login_ip": "192.0.2.9" }
- 综上DSL规则参考如下:
e_split("content", jmes='users[*]', output='item') e_json("item",depth=1) e_split("login_histories") e_json("login_histories", depth=1) e_drop_fields("content", "item", "login_histories")
总结:针对以上类似的需求,首先进行分裂,然后再做展开操作,最后删除无关信息。
- 对content中的users进行分裂和展开操作。