Updated on 2025-01-23 GMT+08:00

Managing Static Masking Tasks

This section describes how to create a static masking task. For the source and destination types that support static masking, see Reference: Static Data Masking Scenarios.

Static data masking prevents private data leakage, and ensures regulatory compliance as well as data security for enterprises. Sensitive data is masked, truncated, and hashed based on the abundant and effective built-in masking algorithms, and the processed data can be written to the target data table. For security purpose, it is the target data table that can be used to provide services for external requirements.

Prerequisites

  • Static masking tasks rely on masking policies. The prerequisites are as follows:
  • For static masking tasks using the DLI engine, the following OBS permissions have been granted to the dlg_agency. For details, see Authorizing dlg_agency.
    obs:bucket:HeadBucket
    obs:bucket:CreateBucket
    obs:object:PutObject
    obs:object:DeleteObject
    obs:bucket:ListBucket
    obs:object:GetObject
    obs:bucket:GetEncryptionConfiguration
    obs:bucket:PutEncryptionConfiguration

Constraints

  • You need to select a proper static masking algorithm based on the field type of the data to be masked. Otherwise, data in the database may be abnormal. For example, if the numeric random algorithm is used to mask date fields, the data type of the fields will be forcibly converted into numeric (Hive and DLI data masking), or a write failure occurs (DWS data masking). If the hash algorithm is used to mask numeric fields, the fields will be forcibly changed to hash value strings (Hive and DLI data masking), or a write failure occurs (DWS data masking).
  • When you run a static masking task for which a sample file needs to be parsed, it is recommended that the sample file be no larger than 10 MB. Otherwise, the static masking task may fail. In addition, OBS sample files can only be used for static DLI data masking tasks and HDFS sample files can only be used for static MRS data masking tasks. For details about the mapping between static masking scenarios and engines, see Reference: Static Data Masking Scenarios.
  • For a static masking task using the DLI engine, the running parameters need to be stored in an OBS bucket. After the task is complete or fails, the task running parameter file is deleted.
    • For a same-source static masking task using the DLI engine, the running parameters are stored in the workspace log bucket named dlf-log-{Project id} by default.
    • For a cross-source static masking task using the DLI engine, the running parameters are stored in the encrypted user bucket named dls-dli-{projectId} that is automatically created.
    Therefore, before performing static masking using the DLI engine, you must grant the following OBS permissions to the dlg_agency. For details, see Authorizing dlg_agency.
    obs:bucket:HeadBucket
    obs:bucket:CreateBucket
    obs:object:PutObject
    obs:object:DeleteObject
    obs:bucket:ListBucket
    obs:object:GetObject
    obs:bucket:GetEncryptionConfiguration
    obs:bucket:PutEncryptionConfiguration
  • For a static masking task using the DLI engine, if the source or destination is GaussDB(DWS), enable network communications between the DLI Spark common queue and GaussDB(DWS). Otherwise, the static masking task will fail. For details, see Configuring the Connection Between a DLI Queue and a Data Source in a Private Network or Configuring the Connection Between a DLI Queue and a Data Source in the Internet.
  • If the source or destination of a static masking task is DLI, data tables in the DLI default database cannot be masked.
  • Kerberos authentication must be enabled for the MRS cluster where MRS Hive is located, and the Spark component must be installed for the MRS cluster.
  • For a static masking task using the MRS engine, if the source or destination is GaussDB(DWS), configure an agency for the MRS cluster by referring to Reference: Authorizing and Binding an Agency and ensure that the outbound rule of the MRS cluster's security group meets the following requirements. Otherwise, the static masking task will fail.
    • Protocol: TCP
    • Port: 80
    • Destination: 169.254.0.0/16
  • For a static masking task using the MRS engine, if either the source or destination is GaussDB(DWS), the following data types are supported. If there is data of other unsupported types, the static masking task will fail.
    • tinyint
    • smallint
    • int
    • bigint
    • decimal
    • double
    • float
    • boolean
    • string
    • timestamp
  • A same-source static masking task using the GaussDB(DWS) engine does not support cross-database masking. That is, the source and destination data tables must be in the same database.
  • If Dataset Scope is set to Incremental for a static masking task, Timestamp or Date needs to be selected for Time Field.

Create a Static Masking Task

  1. On the DataArts Studio console, locate a workspace and click DataArts Security.
  2. In the left navigation pane, choose Static Masking. In the right pane, click Create.

    Figure 1 Creating a static masking task

  3. In the displayed dialog box, set Task Name and Description and click Next.

    Figure 2 Configuring basic information

  4. Configure the source and destination parameters. For parameter details, see Table 1.

    Figure 3 Configuring the masking task

    The following table lists the parameters of the masking task.
    Table 1 Parameters of the masking task

    Parameter

    Description

    Source Settings

    *Data Source Type

    DLI, DWS and MRS Hive are supported.

    *Data Connection

    Select a data connection that has been created in Management Center. If no data connection is available, create one by referring to Creating a DataArts Studio Data Connection.

    *SQL Queue

    This parameter is mandatory if Data Source Type is set to DLI.

    *Database

    Click Configure to select the database whose data is to be masked.

    Data tables in the DLI default database cannot be masked.

    *Source Table

    Click Configure to select the table whose data is to be masked.

    *Specify Column

    Whether to specify the columns to mask. If this function is enabled, you can configure masking algorithms for specified columns in the source table. You can configure different masking algorithms for multiple columns.

    NOTE:

    Once saved, this option cannot be changed.

    *Column

    This parameter is mandatory when Specify Column is enabled.

    If you want to mask a column, you must select the column and select a masking algorithm. If you only select the masking algorithm, no column will be masked.

    NOTE:
    • You need to select a proper static masking algorithm based on the field type of the data to be masked. Otherwise, data in the database may be abnormal. For example, if the numeric random algorithm is used to mask date fields, the data type of the fields will be forcibly converted into numeric (Hive and DLI data masking), or a write failure occurs (DWS data masking). If the hash algorithm is used to mask numeric fields, the fields will be forcibly changed to hash value strings (Hive and DLI data masking), or a write failure occurs (DWS data masking).
    • Before using the following masking algorithms, you must configure keys:
      • HMAC-SHA256 hash algorithm
      • DWS column encryption algorithm

    For more restrictions on different masking algorithms, see Managing Masking Algorithms.

    *Dataset Scope

    If Dataset Scope is set to Incremental, you can set Time Field to Timestamp or Date.

    Generally, the masking task is scheduled once if this parameter is set to All and is scheduled periodically if this parameter is set to Incremental.

    *Time Field

    If Dataset Scope is set to Incremental, you can set this parameter to Timestamp or Date.

    Masking Policy Settings

    *Masking Policy

    This parameter is configurable only when no column is specified.

    Select a created masking policy from the drop-down list.

    NOTE:
    • You need to select a proper static masking algorithm based on the field type of the data to be masked. Otherwise, data in the database may be abnormal. For example, if the numeric random algorithm is used to mask date fields, the data type of the fields will be forcibly converted into numeric (Hive and DLI data masking), or a write failure occurs (DWS data masking). If the hash algorithm is used to mask numeric fields, the fields will be forcibly changed to hash value strings (Hive and DLI data masking), or a write failure occurs (DWS data masking).
    • Before using the following masking algorithms, you must configure keys:
      • HMAC-SHA256 hash algorithm
      • DWS column encryption algorithm

    For more restrictions on different masking algorithms, see Managing Masking Algorithms.

    Target End Settings

    *Data Source Type

    Select the storage type for the masked data. Table 3 lists the supported masking scenarios.

    *Data Connection

    Select a data connection that has been created in Management Center. If no data connection is available, create one by referring to Creating a DataArts Studio Data Connection.

    *SQL Queue

    This parameter is mandatory if Data Source Type is set to DLI.

    *Database

    Click Configure to select the database for storing the masked data.

    Data tables in the DLI default database cannot be masked.

    *Target Table

    Enter a unique table name. The table is automatically created when the table name entered does not exist.

    Click Test to check whether the target table can be used. Otherwise, you cannot proceed to the next step.

    Execution Engine

    *Execution Engine

    Select the engine that runs the masking task. Table 3 lists the supported engines and precautions in different masking scenarios.

    Masking Queue

    * Mask Queue

    Select a queue in the DLI or MRS engine.

  5. Click Next and configure scheduling.

    • If Dataset Scope is set to All, Repeat can be only set to Once.
    • If Dataset Scope is set to Incremental, Repeat can be set to Once or On Schedule.

    If you set Repeat to On Schedule, set the parameters listed in Table 2.

    Table 2 Parameters for periodic scheduling

    Parameter

    Description

    *Date

    Period during which the task takes effect.

    *Cycle

    The frequency at which a task is executed. The options are:

    • minutes: Select the scheduling start time and end time, and set the interval in minutes.
    • hours: Select the scheduling start time and end time, and set the interval in hours.
    • Day: Set the scheduling time everyday.
    • Week: Select a day in a week and set the specific time to start scheduling.
    • Month: Select a day in a month and set the specific time to start scheduling.

    For example, you can set Cycle to Week, Time to 15:52, and Time Range to Tuesday. In this case, the task is executed at 15:52 every Tuesday within the configured date range.

    Start now

    If you select Start now, the task is scheduled immediately.

    Figure 4 Setting parameters for periodic scheduling

  6. After all settings are complete, click OK.

Related Operations

  • Editing a task: On the Static Masking page, locate a task and click Edit in the Operation column.

    A task in the Scheduling state cannot be edited.

  • Deleting tasks: On the Static Masking page, locate a task, click More in the Operation column, and select Delete. To delete multiple tasks at a time, select the tasks and click Delete above the task list.

    A task in the Scheduling state cannot be deleted.

    The deletion operation cannot be undone. Exercise caution when performing this operation.

  • Running or scheduling a task: On the Static Masking page, locate a task and click Run in the Operation column or click More in the Operation column and select Start.

    You can determine whether a task is scheduled once or repeatedly based on the scheduling period.

  • Viewing running instance logs: On the Static Masking page, locate a task and click to expand instances. Then click View Log.

    If a task fails to be executed, you can locate the failure cause based on logs, rectify the fault, and try the task again. If the fault persists, contact technical support.

Reference: Authorizing and Binding an Agency

  1. Log in to the IAM console.
  2. Choose Agencies. In the agency list, locate the preset MRS_ECS_DEFAULT_AGENCY agency and click Authorize.

    If the preset MRS_ECS_DEFAULT_AGENCY agency is not found, you can buy an MRS cluster and select the MRS_ECS_DEFAULT_AGENCY agency in advanced settings. When the MRS cluster creation starts, the MRS_ECS_DEFAULT_AGENCY agency is automatically generated.

    Figure 5 Authorizing an agency

  3. In the search box, enter KMS and select the KMS Administrator policy.

    The minimum permission required by the MRS_ECS_DEFAULT_AGENCY is kms:cmk:decrypt. In addition to directly granting the KMS Administrator policy, you can create a custom policy with the kms:cmk:decrypt permission of the KMS on the IAM console and grant the policy to the MRS_ECS_DEFAULT_AGENCY.

    Figure 6 Selecting permissions

  4. After selecting the permission, click Next to set the authorization scope. In this example, retain the default settings and click OK to complete the authorization.
  5. On the MRS management console, choose Clusters > Active Clusters. Click the name of the target cluster to go to the cluster details page.
  6. On the Dashboard page, locate the O&M Management area and check that the cluster has been bound to the MRS_ECS_DEFAULT_AGENCY agency. If the cluster is not bound to the MRS_ECS_DEFAULT_AGENCY agency, you need to manually select the MRS_ECS_DEFAULT_AGENCY agency.

    Figure 7 Binding an agency

Reference: Static Data Masking Scenarios

Table 3 lists the static masking scenarios supported by privacy protection.
Table 3 Static masking scenarios

Data Source (Source)

Data Source (Target)

Computing Engine

Description

Data Lake Insight (DLI)

Data Lake Insight (DLI)

DLI Spark common queue

None

GaussDB(DWS)

DLI Spark common queue

GaussDB(DWS)

DWS

  • GaussDB(DWS) cluster
  • MRS cluster
  • DLI Spark common queue

GaussDB(DWS) engine:

  • A same-source static masking task using the GaussDB(DWS) engine does not support cross-database masking. That is, the source and destination data tables must be in the same database.

MRS engine:

  • Kerberos authentication must be enabled for the MRS cluster where MRS Hive is located, and the Spark component must be installed for the MRS cluster.
  • For a static masking task using the MRS engine, if the source or destination is GaussDB(DWS), configure an agency for the MRS cluster by referring to Reference: Authorizing and Binding an Agency and ensure that the outbound rule of the MRS cluster's security group meets the following requirements. Otherwise, the static masking task will fail.
    • Protocol: TCP
    • Port: 80
    • Destination: 169.254.0.0/16
DLI engine:

MRS Hive

MRS cluster where MRS Hive is located

  • Kerberos authentication must be enabled for the MRS cluster where MRS Hive is located, and the Spark component must be installed for the MRS cluster.
  • For a static masking task using the MRS engine, if the source or destination is GaussDB(DWS), configure an agency for the MRS cluster by referring to Reference: Authorizing and Binding an Agency and ensure that the outbound rule of the MRS cluster's security group meets the following requirements. Otherwise, the static masking task will fail.
    • Protocol: TCP
    • Port: 80
    • Destination: 169.254.0.0/16
  • For a static masking task using the MRS engine, if either the source or destination is GaussDB(DWS), the following data types are supported. If there is data of other unsupported types, the static masking task will fail.
    • tinyint
    • smallint
    • int
    • bigint
    • decimal
    • double
    • float
    • boolean
    • string
    • timestamp

Data Lake Insight (DLI)

DLI Spark common queue

MRS Hive

MRS Hive

MRS cluster where the source MRS Hive is located

  • Kerberos authentication must be enabled for the MRS cluster where MRS Hive is located, and the Spark component must be installed for the MRS cluster.

GaussDB(DWS)

MRS cluster where MRS Hive is located

  • Kerberos authentication must be enabled for the MRS cluster where MRS Hive is located, and the Spark component must be installed for the MRS cluster.
  • For a static masking task using the MRS engine, if the source or destination is GaussDB(DWS), configure an agency for the MRS cluster by referring to Reference: Authorizing and Binding an Agency and ensure that the outbound rule of the MRS cluster's security group meets the following requirements. Otherwise, the static masking task will fail.
    • Protocol: TCP
    • Port: 80
    • Destination: 169.254.0.0/16
  • For a static masking task using the MRS engine, if either the source or destination is GaussDB(DWS), the following data types are supported. If there is data of other unsupported types, the static masking task will fail.
    • tinyint
    • smallint
    • int
    • bigint
    • decimal
    • double
    • float
    • boolean
    • string
    • timestamp