Managing Static Masking Tasks
This section describes how to create a static masking task. For the source and destination types that support static masking, see Reference: Static Data Masking Scenarios.
Static data masking prevents private data leakage, and ensures regulatory compliance as well as data security for enterprises. Sensitive data is masked, truncated, and hashed based on the abundant and effective built-in masking algorithms, and the processed data can be written to the target data table. For security purpose, it is the target data table that can be used to provide services for external requirements.
Prerequisites
- Static masking tasks rely on masking policies. The prerequisites are as follows:
- A built-in or custom masking algorithm has been created. For details, see Managing Masking Algorithms.
- A masking policy has been created. For details, see Creating a Data Masking Policy.
- A sensitive data discovery task has been created for the data tables to be masked. For details, see Creating a Sensitive Data Discovery Task.
- The sensitive data status has been changed to valid on the Sensitive Data Distribution page. For details, see Viewing Sensitive Data Distribution.
- For static masking tasks using the DLI engine, the following OBS permissions have been granted to the dlg_agency. For details, see Authorizing dlg_agency.
obs:bucket:HeadBucket obs:bucket:CreateBucket obs:object:PutObject obs:object:DeleteObject obs:bucket:ListBucket obs:object:GetObject obs:bucket:GetEncryptionConfiguration obs:bucket:PutEncryptionConfiguration
Constraints
- You need to select a proper static masking algorithm based on the field type of the data to be masked. Otherwise, data in the database may be abnormal. For example, if the numeric random algorithm is used to mask date fields, the data type of the fields will be forcibly converted into numeric (Hive and DLI data masking), or a write failure occurs (DWS data masking). If the hash algorithm is used to mask numeric fields, the fields will be forcibly changed to hash value strings (Hive and DLI data masking), or a write failure occurs (DWS data masking).
- When you run a static masking task for which a sample file needs to be parsed, it is recommended that the sample file be no larger than 10 MB. Otherwise, the static masking task may fail. In addition, OBS sample files can only be used for static DLI data masking tasks and HDFS sample files can only be used for static MRS data masking tasks. For details about the mapping between static masking scenarios and engines, see Reference: Static Data Masking Scenarios.
- For a static masking task using the DLI engine, the running parameters need to be stored in an OBS bucket. After the task is complete or fails, the task running parameter file is deleted.
- For a same-source static masking task using the DLI engine, the running parameters are stored in the workspace log bucket named dlf-log-{Project id} by default.
- For a cross-source static masking task using the DLI engine, the running parameters are stored in the encrypted user bucket named dls-dli-{projectId} that is automatically created.
Therefore, before performing static masking using the DLI engine, you must grant the following OBS permissions to the dlg_agency. For details, see Authorizing dlg_agency.obs:bucket:HeadBucket obs:bucket:CreateBucket obs:object:PutObject obs:object:DeleteObject obs:bucket:ListBucket obs:object:GetObject obs:bucket:GetEncryptionConfiguration obs:bucket:PutEncryptionConfiguration
- For a static masking task using the DLI engine, if the source or destination is GaussDB(DWS), enable network communications between the DLI Spark common queue and GaussDB(DWS). Otherwise, the static masking task will fail. For details, see Configuring the Connection Between a DLI Queue and a Data Source in a Private Network or Configuring the Connection Between a DLI Queue and a Data Source in the Internet.
- If the source or destination of a static masking task is DLI, data tables in the DLI default database cannot be masked.
- Kerberos authentication must be enabled for the MRS cluster where MRS Hive is located, and the Spark component must be installed for the MRS cluster.
- For a static masking task using the MRS engine, if the source or destination is GaussDB(DWS), configure an agency for the MRS cluster by referring to Reference: Authorizing and Binding an Agency and ensure that the outbound rule of the MRS cluster's security group meets the following requirements. Otherwise, the static masking task will fail.
- Protocol: TCP
- Port: 80
- Destination: 169.254.0.0/16
- For a static masking task using the MRS engine, if either the source or destination is GaussDB(DWS), the following data types are supported. If there is data of other unsupported types, the static masking task will fail.
- tinyint
- smallint
- int
- bigint
- decimal
- double
- float
- boolean
- string
- timestamp
- A same-source static masking task using the GaussDB(DWS) engine does not support cross-database masking. That is, the source and destination data tables must be in the same database.
- If Dataset Scope is set to Incremental for a static masking task, Timestamp or Date needs to be selected for Time Field.
Create a Static Masking Task
- On the DataArts Studio console, locate a workspace and click DataArts Security.
- In the left navigation pane, choose Static Masking. In the right pane, click Create.
Figure 1 Creating a static masking task
- In the displayed dialog box, set Task Name and Description and click Next.
Figure 2 Configuring basic information
- Configure the source and destination parameters. For parameter details, see Table 1.
Figure 3 Configuring the masking task
The following table lists the parameters of the masking task.Table 1 Parameters of the masking task Parameter
Description
Source Settings
*Data Source Type
DLI, DWS and MRS Hive are supported.
*Data Connection
Select a data connection that has been created in Management Center. If no data connection is available, create one by referring to Creating a DataArts Studio Data Connection.
*SQL Queue
This parameter is mandatory if Data Source Type is set to DLI.
*Database
Click Configure to select the database whose data is to be masked.
Data tables in the DLI default database cannot be masked.
*Source Table
Click Configure to select the table whose data is to be masked.
*Specify Column
Whether to specify the columns to mask. If this function is enabled, you can configure masking algorithms for specified columns in the source table. You can configure different masking algorithms for multiple columns.
NOTE:Once saved, this option cannot be changed.
*Column
This parameter is mandatory when Specify Column is enabled.
If you want to mask a column, you must select the column and select a masking algorithm. If you only select the masking algorithm, no column will be masked.
NOTE:- You need to select a proper static masking algorithm based on the field type of the data to be masked. Otherwise, data in the database may be abnormal. For example, if the numeric random algorithm is used to mask date fields, the data type of the fields will be forcibly converted into numeric (Hive and DLI data masking), or a write failure occurs (DWS data masking). If the hash algorithm is used to mask numeric fields, the fields will be forcibly changed to hash value strings (Hive and DLI data masking), or a write failure occurs (DWS data masking).
For more restrictions on different masking algorithms, see Managing Masking Algorithms.
*Dataset Scope
If Dataset Scope is set to Incremental, you can set Time Field to Timestamp or Date.
Generally, the masking task is scheduled once if this parameter is set to All and is scheduled periodically if this parameter is set to Incremental.
*Time Field
If Dataset Scope is set to Incremental, you can set this parameter to Timestamp or Date.
Masking Policy Settings
*Masking Policy
This parameter is configurable only when no column is specified.
Select a created masking policy from the drop-down list.
NOTE:- You need to select a proper static masking algorithm based on the field type of the data to be masked. Otherwise, data in the database may be abnormal. For example, if the numeric random algorithm is used to mask date fields, the data type of the fields will be forcibly converted into numeric (Hive and DLI data masking), or a write failure occurs (DWS data masking). If the hash algorithm is used to mask numeric fields, the fields will be forcibly changed to hash value strings (Hive and DLI data masking), or a write failure occurs (DWS data masking).
For more restrictions on different masking algorithms, see Managing Masking Algorithms.
Target End Settings
*Data Source Type
Select the storage type for the masked data. Table 3 lists the supported masking scenarios.
*Data Connection
Select a data connection that has been created in Management Center. If no data connection is available, create one by referring to Creating a DataArts Studio Data Connection.
*SQL Queue
This parameter is mandatory if Data Source Type is set to DLI.
*Database
Click Configure to select the database for storing the masked data.
Data tables in the DLI default database cannot be masked.
*Target Table
Enter a unique table name. The table is automatically created when the table name entered does not exist.
Click Test to check whether the target table can be used. Otherwise, you cannot proceed to the next step.
Execution Engine
*Execution Engine
Select the engine that runs the masking task. Table 3 lists the supported engines and precautions in different masking scenarios.
Masking Queue
* Mask Queue
Select a queue in the DLI or MRS engine.
- If the execution engine is DLI, select a DLI Spark common queue.
For a static masking task using the DLI engine, if the source or destination is GaussDB(DWS), enable network communications between the DLI Spark common queue and GaussDB(DWS). Otherwise, the static masking task will fail. For details, see Configuring the Connection Between a DLI Queue and a Data Source in a Private Network or Configuring the Connection Between a DLI Queue and a Data Source in the Internet.
- If the execution engine is MRS, you need to enter the MRS tenant queue. To view available queues, you can click a cluster name in the cluster list on the MRS console to go to the cluster details page and click the Tenants tab and then the Queue Configuration tab.
- Click Next and configure scheduling.
- If Dataset Scope is set to All, Repeat can be only set to Once.
- If Dataset Scope is set to Incremental, Repeat can be set to Once or On Schedule.
If you set Repeat to On Schedule, set the parameters listed in Table 2.
Table 2 Parameters for periodic scheduling Parameter
Description
*Date
Period during which the task takes effect.
*Cycle
The frequency at which a task is executed. The options are:
- minutes: Select the scheduling start time and end time, and set the interval in minutes.
- hours: Select the scheduling start time and end time, and set the interval in hours.
- Day: Set the scheduling time everyday.
- Week: Select a day in a week and set the specific time to start scheduling.
- Month: Select a day in a month and set the specific time to start scheduling.
For example, you can set Cycle to Week, Time to 15:52, and Time Range to Tuesday. In this case, the task is executed at 15:52 every Tuesday within the configured date range.
Start now
If you select Start now, the task is scheduled immediately.
Figure 4 Setting parameters for periodic scheduling
- After all settings are complete, click OK.
Related Operations
- Editing a task: On the Static Masking page, locate a task and click Edit in the Operation column.
A task in the Scheduling state cannot be edited.
- Deleting tasks: On the Static Masking page, locate a task, click More in the Operation column, and select Delete. To delete multiple tasks at a time, select the tasks and click Delete above the task list.
A task in the Scheduling state cannot be deleted.
The deletion operation cannot be undone. Exercise caution when performing this operation.
- Running or scheduling a task: On the Static Masking page, locate a task and click Run in the Operation column or click More in the Operation column and select Start.
You can determine whether a task is scheduled once or repeatedly based on the scheduling period.
- Viewing running instance logs: On the Static Masking page, locate a task and click to expand instances. Then click View Log.
If a task fails to be executed, you can locate the failure cause based on logs, rectify the fault, and try the task again. If the fault persists, contact technical support.
Reference: Authorizing and Binding an Agency
- Log in to the IAM console.
- Choose Agencies. In the agency list, locate the preset MRS_ECS_DEFAULT_AGENCY agency and click Authorize.
If the preset MRS_ECS_DEFAULT_AGENCY agency is not found, you can buy an MRS cluster and select the MRS_ECS_DEFAULT_AGENCY agency in advanced settings. When the MRS cluster creation starts, the MRS_ECS_DEFAULT_AGENCY agency is automatically generated.
Figure 5 Authorizing an agency
- In the search box, enter KMS and select the KMS Administrator policy.
The minimum permission required by the MRS_ECS_DEFAULT_AGENCY is kms:cmk:decrypt. In addition to directly granting the KMS Administrator policy, you can create a custom policy with the kms:cmk:decrypt permission of the KMS on the IAM console and grant the policy to the MRS_ECS_DEFAULT_AGENCY.
Figure 6 Selecting permissions
- After selecting the permission, click Next to set the authorization scope. In this example, retain the default settings and click OK to complete the authorization.
- On the MRS management console, choose Clusters > Active Clusters. Click the name of the target cluster to go to the cluster details page.
- On the Dashboard page, locate the O&M Management area and check that the cluster has been bound to the MRS_ECS_DEFAULT_AGENCY agency. If the cluster is not bound to the MRS_ECS_DEFAULT_AGENCY agency, you need to manually select the MRS_ECS_DEFAULT_AGENCY agency.
Figure 7 Binding an agency
Reference: Static Data Masking Scenarios
Data Source (Source) |
Data Source (Target) |
Computing Engine |
Description |
---|---|---|---|
Data Lake Insight (DLI) |
Data Lake Insight (DLI) |
DLI Spark common queue |
None |
GaussDB(DWS) |
DLI Spark common queue |
|
|
GaussDB(DWS) |
DWS |
|
GaussDB(DWS) engine:
MRS engine:
DLI engine:
|
MRS Hive |
MRS cluster where MRS Hive is located |
|
|
Data Lake Insight (DLI) |
DLI Spark common queue |
|
|
MRS Hive |
MRS Hive |
MRS cluster where the source MRS Hive is located |
|
GaussDB(DWS) |
MRS cluster where MRS Hive is located |
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot