Anonymizing Log Data Using DSL Processing Functions
Data anonymization effectively reduces exposure and leakage risks during processing, transmission, and use, protecting user rights and interests. This section describes common anonymization scenarios, methods, and examples applied during data processing in LTS.
Introduction
Sensitive data includes mobile numbers, bank card numbers, email addresses, IP addresses, access key IDs (AKs), ID card numbers, websites, order numbers, and strings. In LTS data processing, common anonymization methods include regular expression replacement (key function: regex_replace), Base64 transcoding (key function: base64_encoding), MD5 encoding (key function: md5_encoding), and mapping (key function: str_translate). For details, see Regular Expression Functions and Encoding and Decoding Functions.
Scenario 1: Anonymizing Mobile Numbers
For logs containing mobile numbers that should not be exposed, you can use regular expressions and the regex_replace function to anonymize them. Example:
- Raw log
{ "iphone":"13900001234" } - Processing rule
e_set( "sec_iphone", regex_replace(v("iphone"), r"(\d{0,3})\d{4}(\d{4})", replace=r"\1****\2"), ) - Processing result
{ "sec_iphone": "139****1234", "iphone": 13900001234 }
Scenario 2: Anonymizing Bank Card Information
Use regular expressions and the regex_replace function to anonymize bank card or credit card information in logs.
- Raw log
{ "content":"bank number is 491648411333978312 and credit card number is 4916484113339780" } - Processing rule
e_set( "bank_number", regex_replace( v("content"), r"([1-9]{1})(\d{14}|\d{13}|\d{11})(\d{4})", replace=r"****\3" ), ) - Processing result
{ "bank_number": "bank number is ****8312 and credit card number is ****9780", "content": "bank number is 491648411333978312 and credit card number is 4916484113339780" }
Scenario 3: Anonymizing Email Addresses
Use regular expressions and the regex_replace function to anonymize email addresses contained in logs.
- Raw log
{ "content":"email is username@example.com" } - Processing rule
e_set( "email_encrypt", regex_replace( v("content"), r"[A-Za-z\d]+([-_.][A-Za-z\d]+)*(@([A-Za-z\d]+[-.])+[A-Za-z\d]{2,4})", replace=r"****\2", ), ) - Processing result
{ "content": "email is username@example.com", "email_encrypt": "email is ****@example.com" }
Scenario 4: Anonymizing AKs
Use regular expressions and the regex_replace function to anonymize AKs in logs.
- Raw log
{ "content":"ak id is <testAccessKey ID> and ak key is <testAccessKey Secret>" } - Processing rule
e_set( "akid_encrypt", regex_replace( v("content"), r"([a-zA-Z0-9]{4})(([a-zA-Z0-9]{26})|([a-zA-Z0-9]{12}))", replace=r"\1****", ), ) - Processing result
{ "akid_encrypt": "ak id is jdhc**** and ak key is Jkde****", "content": "ak id is <testAccessKey ID> and ak key is <testAccessKey Secret>" }
Scenario 5: Anonymizing IP Addresses
Use the regex_replace function and regular expressions to capture and anonymize IP addresses.
- Raw log
{ "content":"ip is 192.0.2.10" } - Processing rule
e_set("ip_encrypt",regex_replace(v('content'), r"((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])", replace=r"****")) - Processing result
{ "content": "ip is 2.0.2.10", "ip_encrypt": "ip is ****" }
Scenario 6: Anonymizing ID Card Information
Use the regex_replace function and regular expressions to capture and anonymize ID card numbers in logs.
- Raw log
content: Id card is 111222190002309999
- Processing rule
e_set( "id_encrypt", regex_replace(v("content"), r"\b\d{17}(\d|X)\b", replace=r"\1****") ) - Processing result
{ "id_encrypt": "Id card is 9****", "content": "Id card is 111222190002309999" }
Scenario 7: Anonymizing Websites
Use Base64 encoding and decoding functions to anonymize websites in logs and convert the anonymized data back to plaintext.
- Raw log
{ "content":"https://www.huaweicloud.com/" } - Processing rule
e_set("base64_url",base64_encoding(v("content"))) - Processing result
{ "base64_url": "aHR0cHM6Ly93d3cuaHVhd2VpY2xvdWQuY29tLw==", "content": "https://www.huaweicloud.com/" }
To decode base64_url, use the following DSL syntax rule: base64_decoding(v("base64_url"))
Scenario 8: Anonymizing Order Numbers
Use MD5 encoding functions to anonymize order numbers in logs and prevent others from decoding them.
- Raw log
{ "orderId": "20210101123456" } - Processing rule
e_set("md5_orderId",md5_encoding(v("orderId"))) - Processing result
{ "orderId": 20210101123456, "md5_orderId": "9c0ab8e4d9f4eb6fbd5c508bbca05951" }
Scenario 9: Anonymizing Strings
To prevent key strings in logs from being exposed, you can use the str_translate function to define mapping rules and anonymize key characters or strings.
- Raw log
{ "content": "message level is info_" } - Processing rule
e_set("data_translate", str_translate(v("content"),"aeiou","12345")) - Processing result
{ "data_translate": "m2ss1g2 l2v2l 3s 3nf4_", "content": "message level is info_" }
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot