Extracting Dynamic Key-Value Pairs from Strings
This section describes how to use different solutions to extract key-value pairs from strings.
Comparison of Common Solutions
Dynamic key-value pair extraction from strings includes keyword extraction, value extraction, keyword processing, and value processing. Common solutions include using the e_kv function, e_kv_delimit, or e_regex function. These three solutions are applicable to different extraction scenarios.
|
Solution |
Keyword Extraction |
Value Extraction |
Keyword Processing |
Value Processing |
|---|---|---|---|---|
|
e_kv |
Uses a specific regular expression. |
Supports the default character set, specific delimiters, or delimiters with a comma (,) or double quotation mark ("). |
Supports prefixes and suffixes. |
Supports text escape. |
|
e_kv_delimit |
Uses a specific regular expression. |
Uses delimiters. |
Supports prefixes and suffixes. |
Default: none |
|
e_regex |
Combines user-defined regular expressions and default character sets for filtering. |
Fully customized |
Custom |
Custom |
Most key-value pairs can be extracted using the e_kv function with specific parameters, especially when brackets and backslashes need to be extracted and escaped. In other complex or advanced scenarios, the e_regex function can be used to extract key-value pairs. The e_kv_delimit function is easier to use in some specific scenarios.
Keyword Extraction
- Example 1
Take k1: q=asd&a=1&b=2&__1__=3 logs as an example. If you want to extract keywords and values from logs in this format, the three solutions are as follows:
- e_kv function
- Raw log
{ "k1":"q=asd&a=1&b=2&__1__=3" } - Processing rule
e_kv("k1") - Processing result
{ "q": "asd", "a": 1, "b": 2, "k1": "q=asd&a=1&b=2&__1__=3", "__1__": 3 }
- Raw log
- e_kv_delimit function
- Raw log
{ "k1":"q=asd&a=1&b=2&__1__=3" } - Processing rule
# Use an ampersand (&) to separate keys and values and extract keywords. e_kv_delimit("k1", pair_sep=r"&") - Processing result
{ "q": "asd", "a": 1, "b": 2, "k1": "q=asd&a=1&b=2&__1__=3", "__1__": 3 }
- Raw log
- e_regex function
- Raw log
{ "k1":"q=asd&a=1&b=2&__1__=3" } - Processing rule
# Specify the character set to extract keywords and values. e_regex("k1",r"(\w+)=([a-zA-Z0-9]+)",{r"\1": r"\2"}) - Processing result
{ "q": "asd", "a": 1, "b": 2, "k1": "q=asd&a=1&b=2&__1__=3", "__1__": 3 }
- Raw log
- e_kv function
- Example 2
Take content:k1=v1&k2=v2?k3:v3 as an example. To extract keywords using a specific regular expression, do as follows:
- e_kv_delimit function
- Raw log
{ "content":"k1=v1&k2=v2?k3:v3" } - Processing rule
e_kv_delimit("content",pair_sep=r"&?",kv_sep="(?:=|:)") - Processing result
{ "k1": "v1", "k2": "v2", "k3": "v3", "content": "k1=v1&k2=v2?k3:v3" }
- Raw log
- e_regex function
- Raw log
{ "content":"k1=v1&k2=v2?k3:v3" } - Processing rule
e_regex("content",r"([a-zA-Z0-9]+)[=|:]([a-zA-Z0-9]+)",{r"\1": r"\2"}) - Processing result
{ "k1": "v1", "k2": "v2", "k3": "v3", "content": "k1=v1&k2=v2?k3:v3" }
- Raw log
- e_kv_delimit function
Value Extraction
- If there are clear identifiers between dynamic key-value pairs and between keywords and values, for example, a=b or a="cxxx", you are advised to use the e_kv function. Example:
- Raw log
{ "content":"k=\"helloworld\",the change world, k2=\"good\"" } - Processing rule
In this case, use the e_kv function to extract the content excluding the change world:
e_kv("content") # e_regex function e_regex("content",r"(\w+)=\"(\w+)",{r"\1": r"\2"}) - Processing result: The extracted log is as follows:
{ "k2": "good", "k": "helloworld", "content": "k=\"helloworld\",the change world, k2=\"good\"" }
- Raw log
- You are advised to use the e_kv function to extract logs containing double quotation marks (") within the content format content:k1="v1=1"&k2=v2?k3=v3 format. Example:
Keyword Processing
- Both the e_kv and e_kv_delimit functions can process keywords and values using prefix="" and suffix="".
- Raw log
{ "content":"q=asd&a=1&b=2" } - Processing rule (Each statement is executed separately. They have the same function.)
e_kv("content", sep="=", quote='"', prefix="start_", suffix="_end") e_kv_delimit("content", pair_sep=r"&", kv_sep="=", prefix="start_", suffix="_end") e_regex("content",r"(\w+)=([a-zA-Z0-9]+)",{r"start_\1_end": r"\2"}) - Processing result
The processed data is in the keyword processing format, as shown in the following:
{ "start_b_end": 2, "start_a_end": 1, "start_q_end": "asd", "content": "q=asd&a=1&b=2" }
- Raw log
- The e_regex function has a stronger capability of processing keywords. For example:
Value Processing
- Raw logs
""" The following backslashes (\) are common symbols, not escape characters. """ { "content":"k1:\"v1\\\"abc\", k2:\"v2\", k3: \"v3\"" } - Processing rule
e_kv("content",sep=":", quote='"') - Processing result
The extracted log is as follows:
{ "k1": "v1\\", "k2": "v2", "k3": "v3", "content": "k1:\"v1\\\"abc\", k2:\"v2\", k3: \"v3\"" }
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot