云日志服务 LTS
云日志服务 LTS
- 最新动态
- 功能总览
- 产品介绍
- 计费说明
- 快速入门
-
用户指南
- 授权IAM用户使用云日志服务LTS
- 购买LTS资源包
- 日志管理
-
日志接入
- 日志接入概述
- 使用ICAgent插件采集日志
-
使用云服务接入LTS
- 云服务接入LTS概述
- 应用运维管理AOM接入LTS
- API网关APIG接入LTS
- Astro轻应用接入LTS
- 云堡垒机CBH接入LTS
- 内容分发网络CDN接入LTS
- 云防火墙CFW接入LTS
- 云审计服务CTS接入LTS
- 分布式缓存服务DCS接入LTS
- 文档数据库服务DDS接入LTS
- DDoS防护 AAD接入LTS
- 分布式消息服务Kafka版接入LTS
- 数据复制服务DRS接入LTS
- 数据仓库服务GaussDB(DWS)接入LTS
- 弹性负载均衡 ELB接入LTS
- 企业路由器ER接入LTS
- 函数工作流FunctionGraph接入LTS
- 云数据库GaussDB接入LTS
- 图引擎服务GES接入LTS
- 云数据库 TaurusDB接入LTS
- 云数据库GeminiDB接入LTS
- 云数据库GeminiDB Mongo接入LTS
- 云数据库GeminiDB Cassandra接入LTS
- 华为HiLens接入LTS
- 设备接入IoTDA接入LTS
- AI开发平台ModelArts接入LTS
- MapReduce服务MRS接入LTS
- 云数据库RDS for MySQL接入LTS
- 云数据库RDS for PostgreSQL接入LTS
- 云数据库RDS for SQLServer接入LTS
- 应用与数据集成平台ROMA Connect接入LTS
- 视频直播Live接入LTS
- 消息通知服务SMN接入LTS
- 安全云脑SecMaster接入LTS
- 对象存储服务OBS接入LTS(邀测)
- 虚拟私有云VPC接入LTS
- Web应用防火墙WAF接入LTS
- 使用API接入LTS
- 使用SDK接入LTS
- 跨IAM账号接入LTS
- 使用KAFKA协议上报日志到LTS
- 使用Flume采集器上报日志到LTS
- 使用匿名写入采集日志
- 自建中间件
- 日志搜索与分析(默认推荐)
- 日志搜索与分析(管道符方式-邀测)
- 日志可视化
- 日志告警
- 日志转储
- 日志消费与加工
- LTS配置中心管理
- 查看LTS审计事件
- 最佳实践
- 开发指南
- API参考
- SDK参考
- 场景代码示例
- 常见问题
- 视频帮助
- 文档下载
- 通用参考
本文导读
链接复制成功!
解析CSV格式日志
本文档介绍在解析Syslog或者其他文本格式时,针对数据中以特殊字符分隔的格式如何进行解析。
解析正常形式的CSV格式日志
原始日志:
{ "program":"access", "severity":6, "priority":14, "facility":1, "content":"198.51.100.1|10/Jun/2019:11:32:16 +0800|example.com|GET /zf/11874.html HTTP/1.1|200|0.077|6404|198.51.100.10:8001|200|0.060|https://example.com/s?q=%25%24%23%40%21&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei|-|Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-A00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36|-|-" }
背景需求:
- 当program字段值为access时,对字段content做一次PSV(pipe分隔的解析),然后丢弃content字段。
- 将request: GET /zf/11874.html HTTP/1.1字段拆分为request_method、http_version以及request。
- http_referer做URL解码。
- time做格式化。
解决方案:
- 如果program字段值是access,则通过e_psv函数解析content内容,并删除原始字段content。
e_if(e_search("program==access"), e_compose(e_psv("content", "remote_addr, time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid", restrict=True), e_drop_fields("content")))
返回的日志为:
{ "severity": 6, "remote_addr": "198.51.100.1", "request": "GET /zf/11874.html HTTP/1.1", "upstream_addr": "198.51.100.10:8001", "body_bytes_sent": 6404, "session_id": "-", "program": "access", "priority": 14, "http_user_agent": "Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-A00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36", "upstream_status": 200, "request_time": "0.077", "http_referer": "https://example.com/s?q=%3A%2F%3A&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei", "upstream_response_time": "0.060", "host": "example.com", "http_x_forwarded_for": "-", "guid": "-", "facility": 1, "time_local": "10/Jun/2019:11:32:16 +0800", "status": 200 }
- 使用e_regex函数将request字段解析成request_method、request、http_version。
e_regex("request",r"^(?P<request_method>\w+) (?P<request>.+) (?P<http_version>\w+/[\d\.]+)$")
返回的日志为:
"request": "GET /zf/11874.html HTTP/1.1", "request_method": "GET", "http_version": "HTTP/1.1"
- 对http_referer做URL解码。
e_set("http",url_decoding(v("http_referer")))
返回的日志为:
"http": "https://example.com/s?q=:/:&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei",
- 综上解决方案具体如下:
e_if(e_search("program==access"), e_compose(e_psv("content", "remote_addr, time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid", restrict=True), e_drop_fields("content"))) e_regex("request",r"^(?P<request_method>\w+) (?P<request>.+) (?P<http_version>\w+/[\d\.]+)$") e_set("http",url_decoding(v("http_referer")))
输出的日志:
{ "severity": 6, "remote_addr": "198.51.100.1", "request": "GET /zf/11874.html HTTP/1.1", "upstream_addr": "198.51.100.10:8001", "body_bytes_sent": 6404, "session_id": "-", "http_version": "HTTP/1.1", "program": "access", "request_method": "GET", "priority": 14, "http_user_agent": "Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-A00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36", "upstream_status": 200, "request_time": "0.077", "http_referer": "https://example.com/s?q=%3A%2F%3A&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei", " upstream_response_time": "0.060", "host": "example.com", "http_x_forwarded_for": "-", "guid": "-", "http": "https://example.com/s?q=:/:&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei", "facility": 1, " time_local": "10/Jun/2019:11:32:16 +0800", "status": 200 }
解析非正常形式的CSV格式日志
如下日志格式存在一条异常日志信息,用户想对content进行解析。
- 原始日志
{ "content":"192.168.0.1|07/Aug/2019:11:10:37 +0800|www.learn.example.com|GET /index/htsw/?ad=5|8|6|11| HTTP/1.1|200|6.729|14559|192.168.0.1:8001|200|6.716|-|-|Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))||" }
- 解决方案
在content中的GET /index/htsw/?ad=5|8|6|11| HTTP/1.1,如果使用e_csv解析不出正确的字段,需要先把这一块提取出来,然后在content中把这块内容替换成空。
e_regex("content", r"[^\|]+\|[^\|]+\|[^\|]+\|(?P<request>(.+)HTTP/\d.\d)") e_set("content", regex_replace(v("content"), r"([^\|]+\|[^\|]+\|[^\|]+)\|((.+)HTTP/\d.\d)\|(.+)",replace= r"\1||\4")) e_psv("content", "remote_addr,time_local,host,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid", restrict=True)
- 输出日志
{ "request": "GET /index/htsw/?ad=5|8|6|11| HTTP/1.1", "remote_addr": "192.168.0.1", "upstream_addr": 14559, "body_bytes_sent": "6.729", "time_local": "07/Aug/2019:11:10:37 +0800", "session_id": "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))", "content": "192.168.0.1|07/Aug/2019:11:10:37 +0800|www.learn.example.com||200|6.729|14559|192.168.0.1:8001|200|6.716|-|-|Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))||", "http_user_agent": "-", "upstream_status": "192.168.0.1:8001", "request_time": 200, "http_referer": "6.716", " upstream_response_time": 200, "host": "www.learn.example.com", "http_x_forwarded_for": "-", "guid": "", "status": "" }
父主题: 文本解析