Field Value Extraction Function
This section describes field value extraction functions, including their syntax, parameters, and usage examples.
Function List
Type |
Function |
Description |
Regular expression extraction |
Extracts the value of a field based on the regular expression and assigns the value to other fields. This function can be used together with other functions. |
|
JSON extraction |
Performs JSON operations on JSON objects in specified fields, including JSON expansion, JMES extraction, and JMES extraction and then expansion. This function can be used together with other functions. |
|
Delimiter extraction |
Extracts multiple fields from a specified field using user-defined delimiters and predefined field names.
This function can be used together with other functions. |
|
KV mode extraction |
Extracts key-value pairs from multiple source fields using quote. This function can be used together with other functions. |
|
Extracts key-value pairs from the source field using delimiters. |
e_regex
This function extracts the value of a field based on the regular expression and assigns the value to other fields.
- Function format
e_regex(key,regular_expression,fields_info,mode="fill-auto",pack_json=None)
- Parameter description
Parameter
Type
Mandatory
Description
key
Any
Yes
Source field name. If the field does not exist, no operation is performed. For details about how to set special field names, see section "Event Type."
Regular expression
String
Yes
Regular expression for extracting fields. Capture group and non-capture group regular expressions are supported.
Non-capture groups need to be used in some cases, and the ?: prefix needs to be used. Example: \w+@\w+\.\w(?:\.\cn)? For details about non-capture groups, see "Non-capture Group."
fields_info
String/ List/ Dict
No
Name of the target field after matching. This parameter is mandatory when the regular expression parameter is not configured with the name of the named capture.
mode
String
No
Field overwrite mode. The default value is fill-auto. For details about the field values and meanings, see "Field Extraction Check and Overwriting Mode."
pack_json
String
No
Pack all matching results of the regular expression into the field specified by pack_json. The default value is None, indicating that the matching results are not packed.
- Returned result
Logs with new field values.
- Function example
- Example 1: Extract values that meet the expression from a field.
- Test data
{ "msg": "192.168.0.1 http://... 127.0.0.0" }
- Processing rule
# Extract the first IP address from the msg field. e_regex("msg",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")
- Processing result
msg: 192.168.0.1 http://... 127.0.0.0 ip: 192.168.0.1
- Test data
- Example 2: Extract multiple values that meet the regular expression from a field.
- Test data
{ "msg": "192.168.0.1 http://... 127.0.0.0" }
- Processing rule
# Extract the two IP addresses in the msg field and assign them to server_ip and client_ip, respectively. e_regex("msg",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}",["server_ip","client_ip"])
- Processing result
msg: 192.168.0.1 http://... 127.0.0.0 server_ip: 192.168.0.1 client_ip: 127.0.0.0
- Test data
- Example 3: Extract values that meet the expression through the capture group.
- Test data
{ "content": "start sys version: deficience, err: 2" }
- Processing rule
# Use a regular expression to capture the version and error values in content. e_regex("content",r"start sys version: (\w+),\s*err: (\d+)",["version","error"])
- Processing result
content: start sys version: deficience, err: 2 error: 2 version: deficience
- Test data
- Example 4: Extract field values through the named capture group.
- Test data
{ "content": "start sys version: deficience, err: 2" }
- Processing rule
e_regex("content",r"start sys version: (?P<version>\w+),\s*err: (?P<error>\d+)")
- Processing result
content: start sys version: deficience, err: 2 error: 2 version: deficience
- Test data
- Example 5: Use regular expressions to capture the value in the dict field and dynamically name the field and assign value.
- Test data
{ "dict": "verify:123" }
- Processing rule
e_regex("dict",r"(\w+):(\d+)",{r"k_\1": r"v_\2"})
- Processing result
dict: verify:123 k_verify: v_123
- Test data
- Example 6: Extract values that match the expression from the field, package them, and assign them to the name field.
- Test data
{ "msg": "192.168.0.1 http://... 127.0.0.0" }
- Processing rule
e_regex("msg", r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", "ip", pack_json="name")
- Processing result
msg:192.168.0.1 http://... 127.0.0.0 name:{"ip": "192.168.0.1"}
- Test data
- Example 7: Use regular expressions to extract values from the dict field, dynamically name the field and its value, and pack and assign it to the name field.
- Test data
{ "dict": "x:123, y:456, z:789" }
- Processing rule
e_regex("dict", r"(\w+):(\d+)", {r"k_\1": r"v_\2"}, pack_json="name")
- Processing result
dict:x:123, y:456, z:789 name:{"k_x": "v_123", "k_y": "v_456", "k_z": "v_789"}
- Test data
- Example 8: Extract the values that match the expression using a capture group and assign them to the name field.
- Test data
{ "content": "start sys version: deficience, err: 2" }
- Processing rule
e_regex( "content", r"start sys version: (\w+),\s*err: (\d+)", ["version", "error"],pack_json="name")
- Processing result
content:start sys version: deficience, err: 2 name:{"version": "deficience", "error": "2"}
- Test data
- Example 1: Extract values that meet the expression from a field.
- More
This function can be used together with other functions.
e_json
This function performs JSON operations on JSON objects in specified fields, including JSON expansion, JMES extraction, and JMES extraction and then expansion.
- Function format
e_json(key, expand=None, depth=100, prefix="__", suffix="__", fmt="simple", sep=".", expand_array=true, fmt_array="{parent}_{index}", include_node=r"[\u4e00-\u9fa5\u0800-\u4e00a-zA-Z][\w\-\.]*", exclude_node="", include_path="", exclude_path="", jmes="", output="", jmes_ignore_none=false, mode='fill-auto' )
- Parameter description
Parameter
Type
Mandatory
Description
key
String
Yes
Source field name. If the field does not exist, no operation is performed.
expand
Boolean
No
Whether to expand the field.
- If the jmes parameter is not set, the default value true is used, indicating that the field is expanded.
- If the jmes parameter is set, the default value false is used, indicating that the field is not expanded.
depth
Number
No
Field expansion depth. The value ranges from 1 to 2,000. The value 1 indicates that only the first layer is expanded. The default value is 100.
prefix
String
No
Prefix added to the field name during expansion.
suffix
String
No
Suffix added to the field name during expansion.
fmt
String
No
Formatting mode. Options:
- simple (default value): Use the node name as the field name. The display format is {prefix}{current}{suffix}.
- full: The parent node and current node are combined as the field name. The display format is {parent_list_str}{sep}{prefix}{current}{suffix}. The delimiter is specified by the sep parameter. The default value is a period (.).
- parent: The complete path is used as the field name. The display format is {parent}{sep}{prefix}{current}{suffix}. The delimiter is specified by the sep parameter. The default value is a period (.).
- root: The root node and current node are combined as the field name. The display format is {parent_list[0]}{sep}{prefix}{current}{suffix}. The delimiter is specified by the sep parameter. The default value is a period (.).
sep
String
No
Delimiter for formatting parent and child nodes. This parameter is mandatory when fmt is set to full, parent, or root. The default value is a period (.).
expand_array
Boolean
No
Whether to expand an array. The default value is true, indicating that the array is expanded.
fmt_array
String
No
Format for expanding an array. The format is {parent_rlist[0]}_{index}. You can also use a maximum of five placeholders to customize a format string: parent_list, current, sep, prefix, and suffix.
include_node
String/ Number
No
List of allowed nodes, indicating the node names included during filtering. By default, only nodes that contain only Chinese characters, digits, letters, underscores (_), periods (.), and hyphens (-) are automatically expanded.
exclude_node
String
No
List of restricted nodes, indicating the node names excluded during filtering.
include_path
String
No
List of allowed nodes, indicating the node paths included during filtering.
exclude_path
String
No
List of restricted nodes, indicating the node paths excluded during filtering.
jmes
String
No
Converts the field value to a JSON object and extracts a specific value using JMES.
output
String
No
Field name output when a specific value is extracted using JMES.
jmes_ignore_none
Boolean
No
Whether to ignore the value when JMES cannot extract the value. The default value is true, indicating that the value is ignored. Otherwise, an empty string is output.
mode
String
No
Field overwrite mode. The default value is fill-auto.
- JSON expansion and filtering
- If the list of allowed nodes is set, the content must be included in the list of allowed nodes and then appear in the result. Example of the regular expression of the node allowlist: e_json("json_data_filed", ...., include_node=r'key\d+')
- If the node restriction list is set, the content must be included in the node restriction list and will not be displayed in the result. Example of the regular expression of the node restriction list: e_json("json_data_filed", ...., exclude_node=r'key\d+')
- Expand the node path: The regular expressions include_path and exclude_path match the path from the beginning. The matched path is separated by periods (.).
- JMES filtering
Use JMES to select and calculate.
- Select the element attribute list under a specific JSON path: e_json(..., jmes="cve.vendors[*].product",output="product")
- Concatenate element attributes under a specific JSON path with commas (,): e_json(..., jmes="join(',', cve.vendors[*].name)",output="vendors")
- Calculate the maximum attribute value of elements under a specific JSON path: e_json(..., jmes="max(words[*].score)",output="hot_word")
- Return an empty string if a specific path does not exist or is empty: e_json(..., jmes="max(words[*].score)",output="hot_word", jmes_ignore_none=false)
- The following shows how to use parent_list and parent_rlist.
Test data:
{ "data": { "k1": 100,"k2": {"k3": 200,"k4": {"k5": 300}}} }
parent_list arranges the parent nodes from left to right.
e_json("data", fmt='{parent_list[0]}-{parent_list[1]}#{current}')
Obtained logs:
data:{ "k1": 100,"k2": {"k3": 200,"k4": {"k5": 300}}} data-k2#k3:200 data-k2#k5:300
parent_rlist arranges the parent nodes from right to left.
e_json("data", fmt='{parent_rlist[0]}-{parent_rlist[1]}#{current}')
Obtained logs:
data:{ "k1": 100,"k2": {"k3": 200,"k4": {"k5": 300}}} k2-data#k3:200 k4-k2#k5:300
- Returned result
- Function example
- Example 1: Expand fields.
- Test data
{ "data": {"k1": 100, "k2": 200} }
- Processing rule
e_json("data",depth=1)
- Processing result
data: {"k1": 100, "k2": 200} k1: 100 k2: 200
- Test data
- Example 2: Add prefixes and suffixes to field names.
- Test data
{ "data": {"k1": 100, "k2": 200} }
- Processing rule
e_json("data", prefix="data_", suffix="_end")
- Processing result
data: {"k1": 100, "k2": 200} data_k1_end: 100 data_k2_end: 200
- Test data
- Example 3: Expand fields in different formats.
- Test data
{ "data": {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } }
- fmt=full format
e_json("data", fmt='full')
data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } data.k1: 100 data.k2.k3: 200 data.k2.k4.k5: 300
- fmt=parent format
e_json("data", fmt='parent')
data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } data.k1: 100 k2.k3: 200 k4.k5: 3000
- fmt=root format
e_json("data", fmt='root')
data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } data.k1: 100 data.k3: 200 data.k5: 300
- Test data
- Example 4: Extract JSON using the specified delimiter, field name prefix, and field name suffix
- Test data
{ "data": {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } }
- Processing rule
e_json("data", fmt='parent', sep="@", prefix="__", suffix="__")
- Processing result
data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } data@__k1__: 100 k2@__k3__: 200 k4@__k5__: 300
- Test data
- Example 5: Specify the fmt_array parameter and extract JSON in array mode.
- Test data
{ "people": [{"name": "xm", "gender": "boy"}, {"name": "xz", "gender": "boy"}, {"name": "xt", "gender": "girl"}] }
- Processing rule
e_json("people", fmt='parent', fmt_array="{parent_rlist[0]}-{index}")
- Processing result
people: [{"name": "xm", "gender": "boy"}, {"name": "xz", "gender": "boy"}, {"name": "xt", "gender": "girl"}] people-0.name: xm people-0.gender: boy people-1.name: xz people-1.gender: boy people-2.name: xt people-2.gender: girl
- Test data
- Example 6: Use JMES to extract JSON objects.
- Test data
{ "data": { "people": [{"first": "James", "last": "d"},{"first": "Jacob", "last": "e"}],"foo": {"bar": "baz"}} }
- Processing rule
e_json("data", jmes='foo', output='jmes_output0') e_json("data", jmes='foo.bar', output='jmes_output1') e_json("data", jmes='people[0].last', output='jmes_output2') e_json("data", jmes='people[*].first', output='jmes_output3')
- Processing result
data: { "people": [{"first": "James", "last": "d"},{"first": "Jacob", "last": "e"}],"foo": {"bar": "baz"}} jmes_output0: {"bar": "baz"} jmes_output1: baz jmes_output2: d jmes_output3: ["james", "jacob"]
- Test data
- Example 1: Expand fields.
- More
This function can be used together with other functions.
e_csv, e_psv, and e_tsv
These functions extract multiple fields from a specified field using user-defined delimiters and predefined field names.
- e_csv: The default delimiter is a comma (,).
- e_psv: The default delimiter is a vertical bar (|).
- e_tsv: The default delimiter is \t.
- Function format
e_csv(Source field name, Target field list, sep=",", quote='"', restrict=true, mode="fill-auto") e_psv(Source field name, Target field list, sep="|", quote='"', restrict=true, mode="fill-auto") e_tsv(Source field name, Target field list, sep="\t", quote='"', restrict=true, mode="fill-auto")
- Parameter description
Parameter
Type
Mandatory
Description
Source field name
Any
Yes
Source field name. If the field does not exist, no operation is performed.
Target field list
Any
Yes
Field name corresponding to each value after delimiter separation. The value can be a string list, for example, ["error", "message", "result"].
If the field name does not contain commas (,), you can use commas (,) as delimiters, for example, "error, message, result".
sep
String
No
Delimiter, which can only be a single character.
quote
String
No
Quote character used to wrap values. This parameter is required when the value contains delimiters.
restrict
Boolean
No
Whether to use the strict mode. The default value is false, indicating the non-strict mode. When the number of delimited values is different from the number of target field lists:
- In strict mode, no operation is performed.
- In non-strict mode, values are assigned to the first several fields that can be paired.
mode
String
No
Field overwrite mode. The default value is fill-auto.
- Returned result
Logs with new field values.
- Function example
The following example use e_csv. The e_psv and e_tsv functions are similar.
- Test data
{ "content": "192.168.0.100,10/Jun/2019:11:32:16 +0800,example.aadoc.com,GET /zf/11874.html HTTP/1.1,200,0.077,6404,192.168.0.100:8001,200,0.060,https://image.developer.aadoc.com/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei,-,Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36,-,-" }
- Processing rule
e_csv("content", "remote_addr, time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid")
- Processing result
content: 192.168.0.100,10/Jun/2019:11:32:16 +0800,example.aadoc.com,GET /zf/11874.html HTTP/1.1,200,0.077,6404,192.168.0.100:8001,200,0.060,https://image.developer.aadoc.com/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei,-,Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36,-,- body_bytes_sent: 6404 guid: - host: example.aadoc.com http_referer: https://image.developer.aadoc.com/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei http_user_agent: Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36 http_x_forwarded_for: - remote_addr: 192.168.0.100 request: GET /zf/11874.html HTTP/1.1 request_time: 0.077 session_id: - status: 200 time_local: 10/Jun/2019:11:32:16 +0800 topic: syslog-forwarder upstream_addr: 192.168.0.100:800 1upstream_response_time: 0.060 upstream_status: 200
- Test data
- More
This function can be used together with other functions.
e_kv
This function extracts key-value pairs from multiple source fields using quote.
- Function format
e_kv(source field or source field list, sep="=", quote='"', escape=false, prefix="", suffix="", mode="fill-auto")
- Parameter description
Parameter
Type
Mandatory
Description
Source field or source field list
String or string list
Yes
Field name or a list of multiple field names.
sep
String
No
Delimiter of the regular expression of the keyword and value. The default value is =. It is not limited to a single character.
Note: Non-capturing groups can be used, but capturing groups cannot be used.
quote
String
No
Quotation mark, which is used to enclose values. The default value is ".
Note: The values of the extracted dynamic key-value pairs need to be enclosed by quote, for example, a="abc" and b="xyz". If the extraction object does not contain, only the values of the following character sets are extracted: Chinese characters, letters, digits, underscores (_), hyphens (-), periods (.), percent signs (%), and tildes (~). For example, if a=Chinese ab12_-.%~|abc b=123, a: Chinese ab12_-.%~ and b: 123 can be extracted.
escape
Boolean
No
Whether to automatically extract the value of the reverse character. The default value is false, meaning the value of the reverse character is not automatically extracted. For example, for key="abc\"xyz", the value abc\ is extracted from key by default. If escape is set to true, abc"xyz is extracted.
prefix
String
No
Prefix added to the extracted field name.
suffix
String
No
Suffix added to the extracted field name.
mode
String
No
Field overwrite mode. The default value is fill-auto.
- Returned result
Logs with new field values.
- Function example
- Example 1: Use the default delimiter = to extract key-value pairs.
- Test data
{ "http_refer": "https://video.developer.aadoc.com/s?q=asd&a=1&b=2" }
If the test data is request_uri: a1=1&a2=&a3=3, the value of a2 is empty. The e_kv() function cannot extract a2. You can use the e_regex() function to extract it, for example, e_regex("request_uri",r'(\w+)=([^=&]*)',{r"\1":r"\2"},mode="overwrite").
- Processing rule
e_kv("http_refer")
- Processing result
http_refer: https://video.developer.aadoc.com/s?q=asd&a=1&b=2 q: asd a: 1 b: 2
- Test data
- Example 2: Add prefixes and suffixes to field names.
- Test data
{ "http_refer": "https://video.developer.aadoc.com/s?q=asd&a=1&b=2" }
- Processing rule
e_kv( "http_refer", sep="=", quote='"', escape=false, prefix="data_", suffix="_end", mode="fill-auto", )
- Processing result
http_refer: https://video.developer.aadoc.com/s?q=asd&a=1&b=2 data_q_end: asd data_a_end: 1 data_b_end: 2
- Test data
- Example 3: Extract key-value pairs from the content2 field and use the escape parameter to extract the value of the reversed character.
- Test data
{ "content2": "k1:\"v1\\"abc\", k2:\"v2\", k3: \"v3\"" }
- Processing rule
e_kv("content2", sep=":", escape=true)
- Processing result
content2: k1:"v1\"abc", k2:"v2", k3: "v3" k1: v1"abc k2: v2 k3: v3
- Test data
- Example 1: Use the default delimiter = to extract key-value pairs.
- More
This function can be used together with other functions.
e_kv_delimit
This function extracts key-value pairs from the source field using delimiters.
- Function format
e_kv_delimit(Source field or source field list, pair_sep=r"\s", kv_sep="=", prefix="", suffix="", mode="fill-auto")
- Parameter description
Parameter
Type
Mandatory
Description
Source field or source field list
String or string list
Yes
Field name or a list of multiple field names.
pair_sep
String
No
Regular character set used to separate key-value pairs. The default value is \s. For example, \s\w and abc\s.
Note: If you need to use a string to separate fields, you are advised to use str_replace or regex_replace to convert the string into a character as the delimiter, and then use the e_kv_delimit function to separate the fields.
kv_sep
String
No
Regular string used to separate key-value pairs. The default value is =, which is not limited to a single character.
Non-capturing groups can be used, but capturing groups cannot be used.
prefix
String
No
Prefix added to the extracted field name.
suffix
String
No
Suffix added to the extracted field name.
mode
String
No
Field overwrite mode. The default value is fill-auto.
- Returned result
Logs with new field values.
- Function example
- Example 1: Use the default delimiter = to extract key-value pairs.
- Test data
{ "data": "i=c1 k1=v1 k2=v2 k3=v3" }
If the test data is request_uri: a1=1&a2=&a3=3, the value of a2 is empty. The e_kv_delimit() function cannot extract a2. You can use the e_regex() function to extract the value, for example, e_regex("request_uri",r'(\w+)=([^=&]*)',{r"\1":r"\2"}, mode="overwrite").
- Processing rule
e_kv_delimit("data")
- Processing result
data: i=c1 k1=v1 k2=v2 k3=v3 i: c1 k2: v2 k1: v1 k3: v3
- Test data
- Example 2: Use delimiters &? to extract key-value pairs.
- Test data
{ "data": "k1=v1&k2=v2?k3=v3" }
- Processing rule
e_kv_delimit("data",pair_sep=r"&?")
- Processing result
data: k1=v1&k2=v2?k3=v3 k2: v2 k1: v1 k3: v3
- Test data
- Example 3: Use regular expressions to extract key-value pairs.
- Test data
{ "data": "k1=v1 k2:v2 k3=v3" }
- Processing rule
e_kv_delimit("data", kv_sep=r"(?:=|:)")
- Processing result
data: k1=v1 k2:v2 k3=v3 k2: v2 k1: v1 k3: v3
- Test data
- Example 1: Use the default delimiter = to extract key-value pairs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot