Field Value Extraction Function_Global Operation Functions_DSL Data Processing Syntax (Beta)_Log Processing_User Guide

Function List

Type	Function	Description
Regular expression extraction	e_regex	Extracts the value of a field based on the regular expression and assigns the value to other fields. This function can be used together with other functions.
JSON extraction	e_json	Performs JSON operations on JSON objects in specified fields, including JSON expansion, JMES extraction, and JMES extraction and then expansion. This function can be used together with other functions.
Delimiter extraction	e_csv, e_psv, and e_tsv	Extracts multiple fields from a specified field using user-defined delimiters and predefined field names. e_csv: The default delimiter is a comma (,). e_psv: The default delimiter is a vertical bar (\|). e_tsv: The default delimiter is \t. This function can be used together with other functions.
KV mode extraction	e_kv	Extracts key-value pairs from multiple source fields using quote. This function can be used together with other functions.
KV mode extraction	e_kv_delimit	Extracts key-value pairs from the source field using delimiters.

e_regex

This function extracts the value of a field based on the regular expression and assigns the value to other fields.

Function format

e_regex(key,regular_expression,fields_info,mode="fill-auto",pack_json=None)

Parameter description

Parameter	Type	Mandatory	Description
key	Any	Yes	Source field name. If the field does not exist, no operation is performed. For details about how to set special field names, see section "Event Type."
Regular expression	String	Yes	Regular expression for extracting fields. Capture group and non-capture group regular expressions are supported. Non-capture groups need to be used in some cases, and the ?: prefix needs to be used. Example: \w+@\w+\.\w(?:\.\cn)? For details about non-capture groups, see "Non-capture Group."
fields_info	String/ List/ Dict	No	Name of the target field after matching. This parameter is mandatory when the regular expression parameter is not configured with the name of the named capture.
mode	String	No	Field overwrite mode. The default value is fill-auto. For details about the field values and meanings, see "Field Extraction Check and Overwriting Mode."
pack_json	String	No	Pack all matching results of the regular expression into the field specified by pack_json. The default value is None, indicating that the matching results are not packed.

Returned result
Logs with new field values.

Function example

Example 1: Extract values that meet the expression from a field.

Test data

{
 "msg": "192.168.0.1 http://... 127.0.0.0"
}

Processing rule

# Extract the first IP address from the msg field.
e_regex("msg",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")

Processing result

msg: 192.168.0.1 http://... 127.0.0.0 
ip: 192.168.0.1

Example 2: Extract multiple values that meet the regular expression from a field.

Test data

{
 "msg": "192.168.0.1 http://... 127.0.0.0"
}

Processing rule

# Extract the two IP addresses in the msg field and assign them to server_ip and client_ip, respectively.
e_regex("msg",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}",["server_ip","client_ip"])

Processing result

msg: 192.168.0.1 http://... 127.0.0.0 
server_ip: 192.168.0.1 
client_ip: 127.0.0.0

Example 3: Extract values that meet the expression through the capture group.

Test data

{
 "content": "start sys version: deficience, err: 2"
}

Processing rule

# Use a regular expression to capture the version and error values in content.
e_regex("content",r"start sys version: (\w+),\s*err: (\d+)",["version","error"])

Processing result

content: start sys version: deficience, err: 2
error: 2
version: deficience

Example 4: Extract field values through the named capture group.

Test data

{
 "content": "start sys version: deficience, err: 2"
}

Processing rule

e_regex("content",r"start sys version: (?P<version>\w+),\s*err: (?P<error>\d+)")

Processing result

content:  start sys version: deficience, err: 2
error:  2
version:  deficience

Example 5: Use regular expressions to capture the value in the dict field and dynamically name the field and assign value.
- Test data
```
{
 "dict": "verify:123"
}
```
- Processing rule
```
e_regex("dict",r"(\w+):(\d+)",{r"k_\1": r"v_\2"})
```
- Processing result
```
dict: verify:123
k_verify: v_123
```

Example 6: Extract values that match the expression from the field, package them, and assign them to the name field.

Test data

{
 "msg": "192.168.0.1 http://... 127.0.0.0"
}

Processing rule

e_regex("msg", r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", "ip", pack_json="name")

Processing result

msg:192.168.0.1 http://... 127.0.0.0
name:{"ip": "192.168.0.1"}

Example 7: Use regular expressions to extract values from the dict field, dynamically name the field and its value, and pack and assign it to the name field.
- Test data
```
{
 "dict": "x:123, y:456, z:789"
}
```
- Processing rule
```
e_regex("dict", r"(\w+):(\d+)", {r"k_\1": r"v_\2"}, pack_json="name")
```
- Processing result
```
dict:x:123, y:456, z:789
name:{"k_x": "v_123", "k_y": "v_456", "k_z": "v_789"}
```

Example 8: Extract the values that match the expression using a capture group and assign them to the name field.

Test data

{
 "content": "start sys version: deficience, err: 2"
}

Processing rule

e_regex( "content", r"start sys version: (\w+),\s*err: (\d+)", ["version", "error"],pack_json="name")

Processing result

content:start sys version: deficience, err: 2
name:{"version": "deficience", "error": "2"}

More
This function can be used together with other functions.

e_json

This function performs JSON operations on JSON objects in specified fields, including JSON expansion, JMES extraction, and JMES extraction and then expansion.

Function format

e_json(key, expand=None, depth=100, prefix="__", suffix="__", fmt="simple", sep=".",
       expand_array=true, fmt_array="{parent}_{index}",
       include_node=r"[\u4e00-\u9fa5\u0800-\u4e00a-zA-Z][\w\-\.]*",
        exclude_node="", include_path="", exclude_path="",
     jmes="", output="", jmes_ignore_none=false, mode='fill-auto'
)

Parameter description

Parameter	Type	Mandatory	Description
key	String	Yes	Source field name. If the field does not exist, no operation is performed.
expand	Boolean	No	Whether to expand the field. If the jmes parameter is not set, the default value true is used, indicating that the field is expanded. If the jmes parameter is set, the default value false is used, indicating that the field is not expanded.
depth	Number	No	Field expansion depth. The value ranges from 1 to 2,000. The value 1 indicates that only the first layer is expanded. The default value is 100.
prefix	String	No	Prefix added to the field name during expansion.
suffix	String	No	Suffix added to the field name during expansion.
fmt	String	No	Formatting mode. Options: simple (default value): Use the node name as the field name. The display format is {prefix}{current}{suffix}. full: The parent node and current node are combined as the field name. The display format is {parent_list_str}{sep}{prefix}{current}{suffix}. The delimiter is specified by the sep parameter. The default value is a period (.). parent: The complete path is used as the field name. The display format is {parent}{sep}{prefix}{current}{suffix}. The delimiter is specified by the sep parameter. The default value is a period (.). root: The root node and current node are combined as the field name. The display format is {parent_list[0]}{sep}{prefix}{current}{suffix}. The delimiter is specified by the sep parameter. The default value is a period (.).
sep	String	No	Delimiter for formatting parent and child nodes. This parameter is mandatory when fmt is set to full, parent, or root. The default value is a period (.).
expand_array	Boolean	No	Whether to expand an array. The default value is true, indicating that the array is expanded.
fmt_array	String	No	Format for expanding an array. The format is {parent_rlist[0]}_{index}. You can also use a maximum of five placeholders to customize a format string: parent_list, current, sep, prefix, and suffix.
include_node	String/ Number	No	List of allowed nodes, indicating the node names included during filtering. By default, only nodes that contain only Chinese characters, digits, letters, underscores (_), periods (.), and hyphens (-) are automatically expanded.
exclude_node	String	No	List of restricted nodes, indicating the node names excluded during filtering.
include_path	String	No	List of allowed nodes, indicating the node paths included during filtering.
exclude_path	String	No	List of restricted nodes, indicating the node paths excluded during filtering.
jmes	String	No	Converts the field value to a JSON object and extracts a specific value using JMES.
output	String	No	Field name output when a specific value is extracted using JMES.
jmes_ignore_none	Boolean	No	Whether to ignore the value when JMES cannot extract the value. The default value is true, indicating that the value is ignored. Otherwise, an empty string is output.
mode	String	No	Field overwrite mode. The default value is fill-auto.

JSON expansion and filtering
- If the list of allowed nodes is set, the content must be included in the list of allowed nodes and then appear in the result. Example of the regular expression of the node allowlist: e_json("json_data_filed", ...., include_node=r'key\d+')
- If the node restriction list is set, the content must be included in the node restriction list and will not be displayed in the result. Example of the regular expression of the node restriction list: e_json("json_data_filed", ...., exclude_node=r'key\d+')
- Expand the node path: The regular expressions include_path and exclude_path match the path from the beginning. The matched path is separated by periods (.).
JMES filtering
Use JMES to select and calculate.
- Select the element attribute list under a specific JSON path: e_json(..., jmes="cve.vendors[*].product",output="product")
- Concatenate element attributes under a specific JSON path with commas (,): e_json(..., jmes="join(',', cve.vendors[*].name)",output="vendors")
- Calculate the maximum attribute value of elements under a specific JSON path: e_json(..., jmes="max(words[*].score)",output="hot_word")
- Return an empty string if a specific path does not exist or is empty: e_json(..., jmes="max(words[*].score)",output="hot_word", jmes_ignore_none=false)

The following shows how to use parent_list and parent_rlist.

Test data:

{
 "data": { "k1": 100,"k2": {"k3": 200,"k4": {"k5": 300}}}
}

parent_list arranges the parent nodes from left to right.

e_json("data", fmt='{parent_list[0]}-{parent_list[1]}#{current}')

Obtained logs:

data:{ "k1": 100,"k2": {"k3": 200,"k4": {"k5": 300}}}
data-k2#k3:200
data-k2#k5:300

parent_rlist arranges the parent nodes from right to left.

e_json("data", fmt='{parent_rlist[0]}-{parent_rlist[1]}#{current}')

Obtained logs:

data:{ "k1": 100,"k2": {"k3": 200,"k4": {"k5": 300}}}
k2-data#k3:200
k4-k2#k5:300

Returned result
Logs with new field values.

Function example

Example 1: Expand fields.

Test data
```
{
 "data": {"k1": 100, "k2": 200}
}
```
Processing rule
```
e_json("data",depth=1)
```

Processing result

data: {"k1": 100, "k2": 200}
k1: 100
k2: 200

Example 2: Add prefixes and suffixes to field names.

Test data
```
{
 "data": {"k1": 100, "k2": 200}
}
```

Processing rule

e_json("data", prefix="data_", suffix="_end")

Processing result

data: {"k1": 100, "k2": 200}
data_k1_end: 100
data_k2_end: 200

Example 3: Expand fields in different formats.

Test data

{
 "data": {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } }
}

fmt=full format

e_json("data", fmt='full')

data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } 
data.k1: 100 
data.k2.k3: 200 
data.k2.k4.k5: 300

fmt=parent format

e_json("data", fmt='parent')

data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } 
data.k1: 100 
k2.k3: 200 
k4.k5: 3000

fmt=root format

e_json("data", fmt='root')

data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } 
data.k1: 100 
data.k3: 200 
data.k5: 300

Example 4: Extract JSON using the specified delimiter, field name prefix, and field name suffix

Test data

{
 "data": {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } }
}

Processing rule

e_json("data", fmt='parent', sep="@", prefix="__", suffix="__")

Processing result

data: {"k1": 100, "k2": {"k3": 200, "k4": {"k5": 300} } } 
data@__k1__: 100
k2@__k3__: 200
k4@__k5__: 300

Example 5: Specify the fmt_array parameter and extract JSON in array mode.

Test data

{
 "people": [{"name": "xm", "gender": "boy"}, {"name": "xz", "gender": "boy"}, {"name": "xt", "gender": "girl"}]
}

Processing rule

e_json("people", fmt='parent', fmt_array="{parent_rlist[0]}-{index}")

Processing result

people: [{"name": "xm", "gender": "boy"}, {"name": "xz", "gender": "boy"}, {"name": "xt", "gender": "girl"}]
people-0.name: xm 
people-0.gender: boy 
people-1.name: xz 
people-1.gender: boy 
people-2.name: xt 
people-2.gender: girl

Example 6: Use JMES to extract JSON objects.

Test data

{
 "data": { "people": [{"first": "James", "last": "d"},{"first": "Jacob", "last": "e"}],"foo": {"bar": "baz"}}
}

Processing rule

e_json("data", jmes='foo', output='jmes_output0')
e_json("data", jmes='foo.bar', output='jmes_output1')
e_json("data", jmes='people[0].last', output='jmes_output2')
e_json("data", jmes='people[*].first', output='jmes_output3')

Processing result

data: { "people": [{"first": "James", "last": "d"},{"first": "Jacob", "last": "e"}],"foo": {"bar": "baz"}}
jmes_output0: {"bar": "baz"}
jmes_output1: baz 
jmes_output2: d 
jmes_output3: ["james", "jacob"]

More
This function can be used together with other functions.

e_csv, e_psv, and e_tsv

These functions extract multiple fields from a specified field using user-defined delimiters and predefined field names.

e_csv: The default delimiter is a comma (,).
e_psv: The default delimiter is a vertical bar (|).
e_tsv: The default delimiter is \t.

Function format

e_csv(Source field name, Target field list, sep=",", quote='"', restrict=true, mode="fill-auto")
e_psv(Source field name, Target field list, sep="|", quote='"', restrict=true, mode="fill-auto")
e_tsv(Source field name, Target field list, sep="\t", quote='"', restrict=true, mode="fill-auto")

Parameter description

Parameter	Type	Mandatory	Description
Source field name	Any	Yes	Source field name. If the field does not exist, no operation is performed.
Target field list	Any	Yes	Field name corresponding to each value after delimiter separation. The value can be a string list, for example, ["error", "message", "result"]. If the field name does not contain commas (,), you can use commas (,) as delimiters, for example, "error, message, result".
sep	String	No	Delimiter, which can only be a single character.
quote	String	No	Quote character used to wrap values. This parameter is required when the value contains delimiters.
restrict	Boolean	No	Whether to use the strict mode. The default value is false, indicating the non-strict mode. When the number of delimited values is different from the number of target field lists: In strict mode, no operation is performed. In non-strict mode, values are assigned to the first several fields that can be paired.
mode	String	No	Field overwrite mode. The default value is fill-auto.

Returned result
Logs with new field values.

Function example

The following example use e_csv. The e_psv and e_tsv functions are similar.

Test data

{
 "content": "192.168.0.100,10/Jun/2019:11:32:16 +0800,example.aadoc.com,GET /zf/11874.html HTTP/1.1,200,0.077,6404,192.168.0.100:8001,200,0.060,https://image.developer.aadoc.com/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei,-,Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36,-,-"
}

Processing rule

e_csv("content", "remote_addr, time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid")

Processing result

content:  192.168.0.100,10/Jun/2019:11:32:16 +0800,example.aadoc.com,GET /zf/11874.html HTTP/1.1,200,0.077,6404,192.168.0.100:8001,200,0.060,https://image.developer.aadoc.com/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei,-,Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36,-,-
  body_bytes_sent:  6404
guid:  -
host:  example.aadoc.com
http_referer:  https://image.developer.aadoc.com/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei
http_user_agent:  Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36 
http_x_forwarded_for:  -
remote_addr:  192.168.0.100 
request:  GET /zf/11874.html HTTP/1.1 
request_time:  0.077
session_id:  -
status:  200
time_local:  10/Jun/2019:11:32:16 +0800 
topic:  syslog-forwarder 
upstream_addr:  192.168.0.100:800
1upstream_response_time:  0.060
upstream_status:  200

More
This function can be used together with other functions.

e_kv

This function extracts key-value pairs from multiple source fields using quote.

Function format

e_kv(source field or source field list, sep="=", quote='"', escape=false, prefix="", suffix="", mode="fill-auto")

Parameter description

Parameter	Type	Mandatory	Description
Source field or source field list	String or string list	Yes	Field name or a list of multiple field names.
sep	String	No	Delimiter of the regular expression of the keyword and value. The default value is =. It is not limited to a single character. Note: Non-capturing groups can be used, but capturing groups cannot be used.
quote	String	No	Quotation mark, which is used to enclose values. The default value is ". Note: The values of the extracted dynamic key-value pairs need to be enclosed by quote, for example, a="abc" and b="xyz". If the extraction object does not contain, only the values of the following character sets are extracted: Chinese characters, letters, digits, underscores (_), hyphens (-), periods (.), percent signs (%), and tildes (~). For example, if a=Chinese ab12_-.%~\|abc b=123, a: Chinese ab12_-.%~ and b: 123 can be extracted.
escape	Boolean	No	Whether to automatically extract the value of the reverse character. The default value is false, meaning the value of the reverse character is not automatically extracted. For example, for key="abc\"xyz", the value abc\ is extracted from key by default. If escape is set to true, abc"xyz is extracted.
prefix	String	No	Prefix added to the extracted field name.
suffix	String	No	Suffix added to the extracted field name.
mode	String	No	Field overwrite mode. The default value is fill-auto.

Returned result
Logs with new field values.

Function example

Example 1: Use the default delimiter = to extract key-value pairs.
- Test data
```
{
 "http_refer": "https://video.developer.aadoc.com/s?q=asd&a=1&b=2"
}
```
  If the test data is request_uri: a1=1&a2=&a3=3, the value of a2 is empty. The e_kv() function cannot extract a2. You can use the e_regex() function to extract it, for example, e_regex("request_uri",r'(\w+)=([^=&]*)',{r"\1":r"\2"},mode="overwrite").
- Processing rule
```
e_kv("http_refer")
```
- Processing result
```
http_refer: https://video.developer.aadoc.com/s?q=asd&a=1&b=2
q: asd 
a: 1
b: 2
```

Example 2: Add prefixes and suffixes to field names.

Test data

{
 "http_refer": "https://video.developer.aadoc.com/s?q=asd&a=1&b=2"
}

Processing rule

e_kv(
    "http_refer",
    sep="=",
    quote='"',
    escape=false,
    prefix="data_",
    suffix="_end",
    mode="fill-auto",
)

Processing result

http_refer: https://video.developer.aadoc.com/s?q=asd&a=1&b=2
data_q_end: asd 
data_a_end: 1
data_b_end: 2

Example 3: Extract key-value pairs from the content2 field and use the escape parameter to extract the value of the reversed character.
- Test data
```
{
 "content2": "k1:\"v1\\"abc\", k2:\"v2\", k3: \"v3\""
}
```
- Processing rule
```
e_kv("content2", sep=":", escape=true)
```
- Processing result
```
content2:  k1:"v1\"abc", k2:"v2", k3: "v3"
k1: v1"abc 
k2: v2 
k3: v3
```

More
This function can be used together with other functions.

e_kv_delimit

This function extracts key-value pairs from the source field using delimiters.

Function format

e_kv_delimit(Source field or source field list, pair_sep=r"\s", kv_sep="=", prefix="", suffix="", mode="fill-auto")

Parameter description

Parameter	Type	Mandatory	Description
Source field or source field list	String or string list	Yes	Field name or a list of multiple field names.
pair_sep	String	No	Regular character set used to separate key-value pairs. The default value is \s. For example, \s\w and abc\s. Note: If you need to use a string to separate fields, you are advised to use str_replace or regex_replace to convert the string into a character as the delimiter, and then use the e_kv_delimit function to separate the fields.
kv_sep	String	No	Regular string used to separate key-value pairs. The default value is =, which is not limited to a single character. Non-capturing groups can be used, but capturing groups cannot be used.
prefix	String	No	Prefix added to the extracted field name.
suffix	String	No	Suffix added to the extracted field name.
mode	String	No	Field overwrite mode. The default value is fill-auto.

Returned result
Logs with new field values.
Function example
1. Example 1: Use the default delimiter = to extract key-value pairs.
  - Test data
```
{
 "data": "i=c1 k1=v1 k2=v2 k3=v3"
}
```
    If the test data is request_uri: a1=1&a2=&a3=3, the value of a2 is empty. The e_kv_delimit() function cannot extract a2. You can use the e_regex() function to extract the value, for example, e_regex("request_uri",r'(\w+)=([^=&]*)',{r"\1":r"\2"}, mode="overwrite").
  - Processing rule
```
e_kv_delimit("data")
```
  - Processing result
```
data: i=c1 k1=v1 k2=v2 k3=v3 
i: c1 
k2: v2 
k1: v1 
k3: v3
```
2. Example 2: Use delimiters &? to extract key-value pairs.
  - Test data
```
{
 "data": "k1=v1&k2=v2?k3=v3"
}
```
  - Processing rule
```
e_kv_delimit("data",pair_sep=r"&?")
```
  - Processing result
```
data: k1=v1&k2=v2?k3=v3
k2: v2 
k1: v1 
k3: v3
```
3. Example 3: Use regular expressions to extract key-value pairs.
  - Test data
```
{
 "data": "k1=v1 k2:v2 k3=v3"
}
```
  - Processing rule
```
e_kv_delimit("data", kv_sep=r"(?:=|:)")
```
  - Processing result
```
data: k1=v1 k2:v2 k3=v3 
k2: v2 
k1: v1 
k3: v3
```

Field Value Extraction Function

Function List

e_regex

e_json

e_csv, e_psv, and e_tsv

e_kv

e_kv_delimit

Feedback

Was this page helpful?