Updated on 2024-04-03 GMT+08:00

Managing Masking Algorithms

This section describes built-in masking algorithms and how to create masking algorithms.

Masking algorithms are mandatory for creating masking policies. The system provides more than 20 built-in masking algorithms. If you want to use these algorithms, you need to configure their parameters. If the built-in algorithms cannot meet your needs, you can create algorithms.

Built-in Masking Algorithms

The following tables lists the masking algorithms.

Table 1 Algorithm types

Type

Description

Scenario

Example

Original

Masked

Hash

Convert data by using hash functions such as password salting and keys.

Used to anonymize structured and unstructured data.

HMAC-SHA256 hash

460031234567890

A34329AE133C48C

Cut

Discard the last few numbers of an attribute to ensure data fuzziness.

Used to anonymize structured and unstructured data.

For example, it can be used to anonymize identifiers and quasi-identifiers.

Cut the last four numbers.

18012345678

1801234

Mask

Replace some characters in an attribute with special characters. Example: *

Used to anonymize structured and unstructured data, such as identifiers and quasi-identifiers.

Mask the last four numbers.

18012345678

1801234****

Encryption

Invoke the built-in encryption algorithms of GaussDB(DWS) and Hive to encrypt data.

There are strict restrictions on the data to be encrypted.

AES

98

2bd806c97f0e00af1a1fc3328fa763a9269723c8db8fac4f93af71db186d6e

DataArts Security provides the following built-in masking algorithms. Before selecting an algorithm, you can use the algorithm configuration and testing functions to check whether the algorithm suits your needs.

Table 2 Built-in algorithms

Type

Name

Description

Configurable

Hash

HMAC-SHA256 hash

Use the HMAC-SHA256 algorithm for hash processing.

A salt value and a key can be configured.

NOTE:
  • Before using the algorithm, you must configure a key.

  • You need to set a salt value rather than use the secure random number provided by the system. Pay attention to the risks.

SHA-256

Use the SHA-256 algorithm for hash processing.

A salt value can be configured.

NOTE:

You need to set a salt value rather than use the secure random number provided by the system. Pay attention to the risks.

Cut

Value truncation

Retain x digits before the decimal point and replace the x-1 digits from the first digit before the decimal point and the digits after the decimal point with 0.

For example, if x is 3, 1234 is truncated to 1200, 999.999 is truncated to 900, and 10.7 is truncated to 0.

The number of digits before the decimal point can be configured.

Date truncation

Truncate a specified date.

The date format and masking range can be configured.

Mask

Masking of specified GaussDB(DWS) columns

Masks specified columns in GaussDB(DWS).

This algorithm can be used only when both the source and destination of a static masking task are GaussDB(DWS) and the execution engine is GaussDB(DWS).

Not supported

Masking with specified characters for GaussDB(DWS)

Replaces the characters from the start to end position with specified characters.

This algorithm can be used only when both the source and destination of a static masking task are GaussDB(DWS) and the execution engine is GaussDB(DWS).

The start position, end position, and mask flag can be configured.

Masking with specified digits for GaussDB(DWS)

Replaces the characters from the start to end position with specified digits.

This algorithm can be used only when both the source and destination of a static masking task are GaussDB(DWS) and the execution engine is GaussDB(DWS).

The start position, end position, and mask flag can be configured.

ID masking

Masks an ID card No.

Not supported

Bank card No. masking

Masks a bank card No.

Not supported

Email masking

Masks email information.

Not supported

Mobile equipment identity masking

Masks the device code, such as IMEI, MEDI, and ESN.

The type can be configured.

IPv6 masking

Masks an IPv6 address.

Not supported

IPv4 masking

Masks an IPv4 address.

Not supported

MAC address masking

Masks a MAC address.

Not supported

Phone No. masking

Masks a phone number.

Not supported

Date type masking

Masks a specified date format, such as ISO, EUR, and USA.

The date format and masking range can be configured.

Masking X to Y

Masks the characters from X to Y of a string.

X and Y can be configured.

Retaining X to Y

Retains the characters from X to Y of a string.

X and Y can be configured.

Masking first n and last m characters

Masks the first n and last m characters of a string.

n and m can be configured.

Retaining first n and last m characters

Retains the first n and last m characters of a string.

n and m can be configured.

Encryption

GaussDB(DWS) column encryption

The symmetric cryptographic algorithm gs_encrypt_aes128(encryptstr,keystr) provided by GaussDB (DWS) is invoked to encrypt DWS data columns. This algorithm uses keystr as the key to encrypt the encryptstr character string and returns the encrypted character string.

Note the following:

  • This algorithm takes effect only when the destination of the masking task is GaussDB (DWS).
  • When SQL decryption is executed after encryption, the decryption result can be correctly returned only when all data is successfully decrypted. Otherwise, the decryption fails.

The key can be configured. The key length ranges from 1 byte to 16 bytes.

NOTE:

Before using the algorithm, you must configure a key.

Hive column encryption

Invokes the Hive column encryption function provided by MRS to encrypt and decrypt Hive data columns. Cryptographic algorithms AES and SMS4 are supported.

Note the following:

  • This algorithm takes effect only when the target of the masking task is Hive.
  • Column encryption can be performed in HDFS tables of only the TextFile and SequenceFile file formats.
  • The Hive column encryption does not support views and the Hive over HBase scenario.

The encryption type can be configured.

Creating a Masking Algorithm

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Security.

    Figure 1 DataArts Security

  2. In the left navigation pane, choose Masking Algorithms.
  3. Click Create.

    Figure 2 Creating a masking algorithm

  4. Set the parameters listed in Table 3 and click OK.

    Figure 3 Configuring algorithm parameters

    The following table lists the masking algorithm parameters.
    Table 3 Parameters for the masking algorithm

    Parameter

    Description

    *Algorithm

    Name of the algorithm to be created. It can contain a maximum of 64 characters and can consist of only letters, digits, and underscores.

    Description

    Brief description of the algorithm. It can contain a maximum of 255 characters.

    *Algorithm Template

    Built-in algorithm template used to customize the algorithm. For details about the available algorithm types and algorithms, see Built-in Masking Algorithms.

Related Operations

  • Editing an algorithm: On the Masking Algorithms page, locate an algorithm and click Edit in the Operation column.

    The parameters that can be edited vary depending on the algorithm type.

  • Testing an algorithm: On the Masking Algorithms page, locate an algorithm and click Test in the Operation column.

    Before using an algorithm, you are advised to test it to ensure that it meets your needs.

    Whether the test function is available varies depending on the algorithm type.

  • Deleting algorithms: On the Masking Algorithms page, locate an algorithm and click Delete in the Operation column. To delete multiple algorithms, select them and click Delete above the list.
    Built-in algorithms cannot be deleted. Custom algorithms that are used by masking policies or specified column masking cannot be deleted. To delete such algorithms, cancel the reference first.

    The deletion operation cannot be undone. Exercise caution when performing this operation.