Creating Identification Rules

To effectively identify sensitive data fields in a database, you can create identification rules. Currently, only built-in rules and simple regular expressions are supported.

If you need more powerful identification rules, you can create combination rules. The sub-rules of a combination rule can be combined through AND, OR, and NOT. A sub-rule supports the following algorithms: Groovy script, regular expression, equal, length judgment, and built-in rule. The matched object can be column content, column name, column comments, table name, table comments, or database name.

In the new version mode, you can configure combination rules only when you are using the enterprise version. In the old version mode, this function is supported in the basic version or an advanced version.

Data security levels, data classifications, and identification rules are DataArts Studio instance-level configurations and can be exchanged between workspaces. In this way, data can be managed based on unified standards in the Data Map component.

After an identification rule is created, it remains to be confirmed by default and cannot take effect for a static masking task. To make the identification rule take effect, perform the following operations:

After running a sensitive data discovery task, you must choose Sensitive Data Distribution in the left navigation pane, click the Manual Recovery tab, and ensure that the identification rule of the task is valid, so that the rule can take effect for dynamic masking tasks.

Prerequisites

(Mandatory) A data security level has been created. For details, see Creating Data Security Levels.
(Optional) A data classification has been created. For details, see Creating Data Classifications.

Constraints

Only the DAYU Administrator, Tenant Administrator, or data security administrator can create, modify, or delete data security levels, classifications, and identification rules. Other common users do not have permission to perform these operations.

If the sensitive data identification rule is of the content identification type (that is, a built-in rule or a custom rule of the content identification type), a field is considered as a sensitive field and matched with a security level and classification only when the proportion of the number of records that match the identification rule of a field to the total number of records in the data table exceeds a specified threshold (80% by default).
Data identification rules that are referenced can be deleted only if the reference is canceled.

Creating a Data Identification Rule

On the DataArts Studio console, locate a workspace and click DataArts Security.
In the left navigation pane, choose Data Identification Rules.
On the displayed page, click Create.

Figure 1 Creating a data identification rule

Set the parameters based on Table 1 and click OK.

Figure 2 Setting parameters for the rule
Click to enlarge

**Table 1** Parameters
Parameter	Description
*Type	The category to which a rule belongs. You can either create a rule based on built-in templates or customize one.
*Security Level	Classify the configured data into different levels. If the existing security levels do not meet the requirements, go to the Data Confidentiality page to create security levels. For details, see Creating Data Security Levels.
Data Classification	Classify the configured data into different types. If the existing classifications do not meet the requirements, go to the Data Classification page to create classifications. For details, see Creating Data Classifications.
Description	A description of the rule to be created.
Built-in
*Template	This parameter is displayed when Type is set to Built-in. The system provides more than 80 sensitive data identification rules, which can be used to identify and mask sensitive personal information (such as bank cards and credit cards), basic personal information (such as mobile numbers and email addresses), network identification information (such as IPv4 and IPv6 addresses), and other sensitive information. You can view the preset sensitive data identification rules on the Preset Rule Templates page. After selecting a preset rule, you can enter test data to check whether the preset rule can identify the test data.
*Name	If Type is set to Built-in, the rule name is automatically generated based on the template.
Custom
*Name	If Type is set to Custom, you can enter a rule name, which is mandatory. You are advised to include the rule meaning into the rule name and avoid meaningless descriptions so that the rule can be quickly located and selected. NOTE: The name must be unique.
*Rule Recognition	This parameter is displayed when Type is set to Custom. The options are None and Regular. If you select None, the sensitive data identification task associated with the rule does not take effect. Data assets cannot be automatically classified. You need to manually add categories.
*Regular	This parameter is displayed when Regular is set for Rule Recognition. If you select Content recognition, enter a custom regular expression. The expression will be used to identify data content. Example: ^ male$\|^female&. If you select Column name recognition, enter a custom regular expression. The expression will be used to accurately or fuzzily identify column names. Multiple column names can be identified at the same time. Example: age\|years. If you select Remarks recognition, enter a custom regular expression. The expression will be used to fuzzily identify remarks. Example: .comment..

Creating a Combination Rule

On the DataArts Studio console, locate a workspace and click DataArts Security.
In the left navigation pane, choose Data Identification Rules.
On the Recognition Rules page, click Create Combination Rule.

Figure 3 Creating a combination rule

Configure the parameters based on Table 1 and click Submit.

Figure 4 Configuring a combination rule
Click to enlarge

**Table 2** Parameters of the combination rule
Parameter	Description
*Name	Name of the rule, which is mandatory. You are advised to include the rule meaning into the rule name and avoid meaningless descriptions so that the rule can be quickly located and selected. NOTE: The name must be unique.
*Security Level	Classify the configured data into different levels. If the existing security levels do not meet the requirements, go to the Data Confidentiality page to create security levels. For details, see Creating Data Security Levels.
Data Classification	Classify the configured data into different types. If the existing classifications do not meet the requirements, go to the Data Classification page to create classifications. For details, see Creating Data Classifications.
Rule Content	A sub-rule in the combination rule. Rule Code: identifies the current rule in a condition expression. Rule Recognition: type of the rule content. The following options are supported: Regular Expression, Groovy script, Regular expression (case insensitive), Equal to, Length equal to, Length greater than, Length less than, and Built-in. Example regular expression: ^Male$\|^Female& Matched Object: data object identified by the rule. The value can be Column content, Column name, Column comments, Table name, Table comments, or Database name. Expression/Rule Template: Enter the expression of the selected identification rule. The expression is used to match objects. Operation: Delete or create a sub-rule.
*Condition Expression	The sub-rules can be combined through AND, OR, and NOT. Custom: Enter a custom regular expression for combining multiple sub-rules through AND, OR, or NOT. The sub-rules are numbered from A to Z. The following logical operators are supported: &&, \|\|, !, and (,). Example expression: A&&B. Hit the rule when all conditions are met: If this option is selected, a logical expression that complies with all rules is automatically generated. Hit the rule when any condition is met: If this option is selected, a logical expression that complies with a rule is automatically generated.
Rule Test	You can enter test data to check whether the rule meets the expectation.
Description	A description of the rule to be created.

Related Operations

Editing an identification rule: On the Data Identification Rules page, locate an identification rule and click Edit in the Operation column to change the security level, classification, and description of the identification rule. For a custom rule, you can also change the rule recognition and regular expression.
Editing the identification rule status: The identification rule is enabled by default. If the identification rule is disabled, it cannot be added to an identification rule group.
To change the status of the identification rule, click or to enable or disable the rule.
Deleting identification rules: On the Data Identification Rules page, locate an identification rule and click Delete in the Operation column. To delete identification rules in a batch, select them and click Delete above the list.
- Data identification rules that are referenced can be deleted only if the reference is canceled.
- The deletion operation cannot be undone. Exercise caution when performing this operation.
Testing preset rule templates: On the Preset Rule Templates tab page, you can view all preset rule templates and test the recognition result of the templates by entering custom sample data.