Defining Identification Rules

To effectively identify sensitive data fields in a database, you can create identification rules. Currently, built-in rules and simple regular expressions are supported.

Data security levels, data classifications, and identification rules are DataArts Studio instance-level configurations and can be exchanged between workspaces. In this way, data can be managed based on unified standards in the Data Map component.

After an identification rule is created, it remains to be confirmed by default and cannot take effect for a static masking task. To make the identification rule take effect, perform the following operations:

After running a sensitive data discovery task, you must choose Sensitive Data Distribution in the left navigation pane, click the Manual Recovery tab, and ensure that the identification rule of the task is valid, so that the rule can take effect for dynamic masking tasks.

Prerequisites

(Mandatory) A data security level has been created. For details, see Creating Data Security Levels.
(Optional) A data classification has been created. For details, see Creating Data Classifications.

Constraints

Only the DAYU Administrator, Tenant Administrator, or data security administrator can create, modify, or delete data security levels, classifications, and identification rules. Other common users do not have permission to perform these operations.

If the sensitive data identification rule is of the content identification type (that is, a built-in rule or a custom rule of the content identification type), a field is considered as a sensitive field and matched with a security level and classification only when the proportion of the number of records that match the identification rule of a field to the total number of records in the data table exceeds a specified threshold (80% by default).
Data identification rules that are referenced can be deleted only if the reference is canceled.

Creating a Data Identification Rule

On the DataArts Studio console, locate a workspace and click DataArts Security.
In the left navigation pane, choose Data Identification Rules.
On the displayed page, click Create.

Figure 1 Creating a data identification rule

Set the parameters based on Table 1 and click OK.

Figure 2 Setting parameters for the rule

**Table 1** Parameters
Parameter	Description
*Type	The category to which a rule belongs. You can either create a rule based on built-in templates or customize one.
*Security Level	Classify the configured data into different levels. If the existing security levels do not meet the requirements, go to the Data Confidentiality page to create security levels. For details, see Creating Data Security Levels.
Data Classification	Classify the configured data into different types. If the existing classifications do not meet the requirements, go to the Data Classification page to create classifications. For details, see Creating Data Classifications.
Description	A description of the rule to be created.
Built-in
*Template	This parameter is displayed when Type is set to Built-in. The system provides more than 80 sensitive data identification rules, which can be used to identify and mask sensitive personal information (such as bank cards and credit cards), basic personal information (such as mobile numbers and email addresses), network identification information (such as IPv4 and IPv6 addresses), and other sensitive information. You can view the preset sensitive data identification rules on the Preset Rule Templates page. After selecting a preset rule, you can enter test data to check whether the preset rule can identify the test data.
*Name	If Type is set to Built-in, the rule name is automatically generated based on the template.
Custom
*Name	If Type is set to Custom, you can enter a rule name, which is mandatory. You are advised to include the rule meaning into the rule name and avoid meaningless descriptions so that the rule can be quickly located and selected. NOTE: The name must be unique.
*Rule Recognition	This parameter is displayed when Type is set to Custom. The options are None and Regular. If you select None, the sensitive data identification task associated with the rule does not take effect. Data assets cannot be automatically classified. You need to manually add categories.
*Regular	This parameter is displayed when Regular is set for Rule Recognition. If you select Content recognition, enter a custom regular expression. The expression will be used to identify data content. Example: ^ male$\|^female&. If you select Column name recognition, enter a custom regular expression. The expression will be used to accurately or fuzzily identify column names. Multiple column names can be identified at the same time. Example: age\|years. If you select Remarks recognition, enter a custom regular expression. The expression will be used to fuzzily identify remarks. Example: .comment..

Related Operations

Editing an identification rule: On the Data Identification Rules page, locate an identification rule and click Edit in the Operation column to change the security level, classification, and description of the identification rule. For a custom rule, you can also change the rule recognition and regular expression.
Editing the identification rule status: The identification rule is enabled by default. If the identification rule is disabled, it cannot be added to an identification rule group.
To change the status of the identification rule, click or to enable or disable the rule.
Deleting identification rules: On the Data Identification Rules page, locate an identification rule and click Delete in the Operation column. To delete identification rules in a batch, select them and click Delete above the list.
- Data identification rules that are referenced can be deleted only if the reference is canceled.
- The deletion operation cannot be undone. Exercise caution when performing this operation.
Testing preset rule templates: On the Preset Rule Templates tab page, you can view all preset rule templates and test the recognition result of the templates by entering custom sample data.