Updated on 2024-10-23 GMT+08:00

Managing Masking Algorithms

Masking algorithms are mandatory for creating masking policies. The system provides more than 20 built-in masking algorithms. If you want to use these algorithms, you need to configure their parameters. If the built-in algorithms cannot meet your needs, you can create algorithms.

This section describes built-in masking algorithms and how to create masking algorithms.

Notes and Constraints

  • When creating a random or character replacement masking algorithm, if you select Sample library for Random Mode or Replacement Mode, the sample file for testing the algorithm cannot be larger than 10 KB. This restriction applies only to the algorithm test and does not apply to real static masking tasks.
  • During the creation of a masking algorithm of the hash type, the dws-SM3 cryptographic hash algorithm is a dedicated algorithm of the DWS engine. The result is a hexadecimal string of lowercase letters. The DWS cluster version must be 8.1.3 or later. The general-SM3 cryptographic hash algorithm is a general algorithm of the DLI or MRS engine. The result is a hexadecimal string of uppercase letters.

Built-in Masking Algorithms

DataArts Security provides the following built-in masking algorithms. Before selecting an algorithm, you can use the algorithm configuration and testing functions to check whether the algorithm suits your needs.

Table 1 Built-in algorithms

Type

Name

Description

Configurable

Hash

HMAC-SHA256 hash

Use the HMAC-SHA256 algorithm for hash processing.

A salt value and a key can be configured.

NOTE:
  • Before using the algorithm, you must configure a key.

  • You need to set a salt value rather than use the secure random number provided by the system. Pay attention to the risks.

SHA-256

Use the SHA-256 algorithm for hash processing.

A salt value can be configured.

NOTE:

You need to set a salt value rather than use the secure random number provided by the system. Pay attention to the risks.

Cut

Value truncation

Retain x digits before the decimal point and replace the x-1 digits from the first digit before the decimal point and the digits after the decimal point with 0.

For example, if x is 3, 1234 is truncated to 1200, 999.999 is truncated to 900, and 10.7 is truncated to 0.

The number of digits before the decimal point can be configured.

Date truncation

Truncate a specified date.

The date format and masking range can be configured.

Mask

Masking of specified GaussDB(DWS) columns

Masks specified columns in GaussDB(DWS).

This algorithm can be used only when both the source and destination of a static masking task are GaussDB(DWS) and the execution engine is GaussDB(DWS).

Not supported

Masking with specified characters for GaussDB(DWS)

Replaces the characters from the start to end position with specified characters.

This algorithm can be used only when both the source and destination of a static masking task are GaussDB(DWS) and the execution engine is GaussDB(DWS).

The start position, end position, and mask flag can be configured.

Masking with specified digits for GaussDB(DWS)

Replaces the characters from the start to end position with specified digits.

This algorithm can be used only when both the source and destination of a static masking task are GaussDB(DWS) and the execution engine is GaussDB(DWS).

The start position, end position, and mask flag can be configured.

ID masking

Masks an ID card No.

Not supported

Bank card No. masking

Masks a bank card No.

Not supported

Email masking

Masks email information.

Not supported

Mobile equipment identity masking

Masks the device code, such as IMEI, MEDI, and ESN.

The type can be configured.

IPv6 masking

Masks an IPv6 address.

Not supported

IPv4 masking

Masks an IPv4 address.

Not supported

MAC address masking

Masks a MAC address.

Not supported

Phone No. masking

Masks a phone number.

Not supported

Date type masking

Masks a specified date format, such as ISO, EUR, and USA.

The date format and masking range can be configured.

Masking X to Y

Masks the characters from X to Y of a string.

X and Y can be configured.

Retaining X to Y

Retains the characters from X to Y of a string.

X and Y can be configured.

Masking first n and last m characters

Masks the first n and last m characters of a string.

n and m can be configured.

Retaining first n and last m characters

Retains the first n and last m characters of a string.

n and m can be configured.

Encryption

GaussDB(DWS) column encryption

The symmetric cryptographic algorithm gs_encrypt_aes128(encryptstr,keystr) provided by GaussDB (DWS) is invoked to encrypt DWS data columns. This algorithm uses keystr as the key to encrypt the encryptstr character string and returns the encrypted character string.

Note the following:

  • This algorithm takes effect only when the destination of the masking task is GaussDB (DWS).
  • When SQL decryption is executed after encryption, the decryption result can be correctly returned only when all data is successfully decrypted. Otherwise, the decryption fails.

The key can be configured. The key length ranges from 1 byte to 16 bytes.

NOTE:

Before using the algorithm, you must configure a key.

Hive column encryption

Invokes the Hive column encryption function provided by MRS to encrypt and decrypt Hive data columns. Cryptographic algorithms AES and SMS4 are supported.

Note the following:

  • This algorithm takes effect only when the target of the masking task is Hive.
  • Column encryption can be performed in HDFS tables of only the TextFile and SequenceFile file formats.
  • The Hive column encryption does not support views and the Hive over HBase scenario.

The encryption type can be configured.

Creating a Masking Algorithm

If the built-in algorithms do not meet your needs, you can create custom masking algorithms, such as mask, truncation, hash, encryption, nulling, random masking, character replacement, key-value masking, value range conversion, and fuzzy masking.

  1. On the DataArts Studio console, locate a workspace and click DataArts Security.
  2. In the left navigation pane, choose Masking Algorithms.
  3. Click Create.

    Figure 1 Creating a masking algorithm

  4. Set the parameters listed in Table 2 and click OK.

    Figure 2 Configuring algorithm parameters

    The following table lists the masking algorithm parameters.
    Table 2 Parameters for the masking algorithm

    Parameter

    Description

    *Algorithm

    Algorithm name, which can contain a maximum of 64 characters

    Description

    Brief description of the algorithm. It can contain a maximum of 255 characters.

    *Masking Algorithm

    The following options are available:

    • Mask: This algorithm supports characters, numeric values, and date values. It replaces data at specified positions with fixed values.
    • Truncate: This algorithm supports date and numeric values. It truncates a date to the month, day, hour, minute, or second and rounds the value.
    • Hash: This algorithm supports all types of data. The selected algorithm is used to calculate the hash value.

      Compared with built-in algorithms, two extra algorithms are available, including dws-SM3 and general-SM3 cryptographic hash algorithms. The dws-SM3 cryptographic hash algorithm is a dedicated algorithm of the DWS engine. The result is a hexadecimal string of lowercase letters. The DWS cluster version must be 8.1.3 or later. The general-SM3 cryptographic hash algorithm is a common algorithm of the DLI or MRS engine. The result is a hexadecimal string of uppercase letters.

    • ENCRYPT: This algorithm supports all types of data. The selected encryption algorithm is used to encrypt data from a specified source.
    • SET_NULL: This algorithm supports all types of data. It sets the value to null.
    • RANDOM: Replaces date or numeric values with values within a specified range or in a sample library. For how to create a sample library, see Managing Sample Libraries. If you select Sample library for Random Mode, the OBS sample file can only be used for static DLI data masking tasks and the HDFS sample file can only be used for static MRS data masking tasks. For details about the mapping between static masking scenarios and engines, see Reference: Static Data Masking Scenarios.

      If you enable Keep Association with Source Data, the same result will be generated for the same data in different databases after the data is masked using the same rule. If this parameter is enabled, data may be cracked. If you need to enable this parameter, you are advised to configure a random salt value to defend against dictionary attacks.

    • CHARACTER_REPLACEMENT: This algorithm replaces numeric values and characters at specified positions with fixed values or values in sample files in the sample library. Random digits or lowercase letters can be used to replace the characters at custom positions. If you select Replace the last digit of an ID card number, Bits can only be 1, and there must be 17 or more bits to be masked before the selected bit.

      For how to create a sample library, see Managing Sample Libraries. If you select Sample library for Replacement Mode, the OBS sample file can only be used for static DLI data masking tasks and the HDFS sample file can only be used for static MRS data masking tasks. For details about the mapping between static masking scenarios and engines, see Reference: Static Data Masking Scenarios.

      If you enable Keep Association with Source Data, the same result will be generated for the same data in different databases after the data is masked using the same rule. If this parameter is enabled, data may be cracked. If you need to enable this parameter, you are advised to configure a random salt value to defend against dictionary attacks.

    • KEY_VALUE: This algorithm replaces numeric keys and values with values that are calculated using custom expressions. The source data supports the following operations: addition (+), subtraction (-), multiplication (*), division (/), parentheses (()), and modulo (%). For example, expression ((X*4+3)%100)/2-1 can replace 3 with 6.5.
    • INTERVAL_TRANSFORMATION: This algorithm converts digits in a specified range into specified values.
    • FUZZY: This algorithm replaces a numeric value with a random value within a fuzzy percentage or absolute value range. For example, in percentage blurring mode, if the percentage ranges from –10% to 20%, value 10 will be replaced with a random value from 9 to 12.

      If you enable Keep Association with Source Data, the same result will be generated for the same data in different databases after the data is masked using the same rule. If this parameter is enabled, data may be cracked. If you need to enable this parameter, you are advised to configure a random salt value to defend against dictionary attacks.

    Test

    Enter the data to be tested and click Test. You can view the masking result in the Test Result area.

    NOTE:

    When creating a random or character replacement masking algorithm, if you select Sample library for Random Mode or Replacement Mode, the sample file for testing the algorithm cannot be larger than 10 KB.

    Test Result

Related Operations

  • Editing an algorithm: On the Masking Algorithms page, locate an algorithm and click Edit in the Operation column.

    The parameters that can be edited vary depending on the algorithm type.

  • Testing an algorithm: On the Masking Algorithms page, locate an algorithm and click Test in the Operation column.

    Before using an algorithm, you are advised to test it to ensure that it meets your needs.

    Whether the test function is available varies depending on the algorithm type.

  • Deleting algorithms: On the Masking Algorithms page, locate an algorithm and click Delete in the Operation column. To delete multiple algorithms, select them and click Delete above the list.
    Built-in algorithms cannot be deleted. Custom algorithms that are used by masking policies or specified column masking cannot be deleted. To delete such algorithms, cancel the reference first.

    The deletion operation cannot be undone. Exercise caution when performing this operation.