Updated on 2024-04-03 GMT+08:00

Creating Identification Rules

To effectively identify sensitive data fields in a database, you can create identification rules.

Data security levels, data classifications, and identification rules are DataArts Studio instance-level configurations and can be exchanged between workspaces. In this way, data can be managed based on unified standards in the Data Map component.

After an identification rule is created, it remains to be confirmed by default and cannot take effect for a static masking task. To make the identification rule take effect, perform the following operations:

After running a sensitive data discovery task, you must choose Sensitive Data Distribution in the left navigation pane, click the Manual Recovery tab, and ensure that the identification rule of the task is valid, so that the rule can take effect for dynamic masking tasks.

Prerequisites

Constraints

  • Only the DAYU Administrator, Tenant Administrator, or data security administrator can create, modify, or delete data security levels, classifications, and identification rules. Other common users do not have permission to perform these operations.
  • If the sensitive data identification rule is of the content identification type (that is, a built-in rule or a custom rule of the content identification type), a field is considered as a sensitive field and matched with a security level and classification only when the proportion of the number of records that match the identification rule of a field to the total number of records in the data table exceeds a specified threshold (80% by default).

Creating a Data Identification Rule

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Security.

    Figure 1 DataArts Security

  2. In the left navigation pane, choose Data Identification Rules.
  3. On the displayed page, click Create.

    Figure 2 Creating a data identification rule

  4. Set the parameters based on Table 1 and click OK.

    Figure 3 Setting parameters for the rule
    Table 1 Parameters

    Parameter

    Description

    *Type

    The category to which a rule belongs. You can either create a rule based on built-in templates or customize one.

    *Security Level

    Classify the configured data into different levels. If the existing security levels do not meet the requirements, go to the Data Confidentiality page to create security levels. For details, see Creating Data Security Levels.

    Data Classification

    Classify the configured data into different types. If the existing classifications do not meet the requirements, go to the Data Classification page to create classifications. For details, see Creating Data Classifications.

    Description

    A description of the rule to be created.

    Built-in

    *Template

    This parameter is displayed when Type is set to Built-in.

    The system provides more than 70 preset sensitive data identification and masking rules for the following information: sensitive personal information (such as phone numbers, debit cards, and credit cards), sensitive enterprise information (such as financial asset information and delivery information), sensitive key information (such as DSA keys and RSA keys), sensitive device information (such as IPv4 and IPv6 addresses), sensitive location information (such as provinces, cities, GPS locations, and addresses), and general sensitive information (date).

    *Name

    If Type is set to Built-in, the rule name is automatically generated based on the template.

    Custom

    *Name

    If Type is set to Custom, you can enter a rule name, which is mandatory. You are advised to include the rule meaning into the rule name and avoid meaningless descriptions so that the rule can be quickly located and selected.
    NOTE:

    The name must be unique.

    *Rule Recognition

    This parameter is displayed when Type is set to Custom. The options are None and Regular.

    If you select None, the sensitive data identification task associated with the rule does not take effect. Data assets cannot be automatically classified. You need to manually add categories.

    *Regular

    This parameter is displayed when Regular is set for Rule Recognition.

    • If you select Content recognition, enter a custom regular expression. The expression will be used to identify data content. Example: ^ male$|^female&.
    • If you select Column name recognition, enter a custom regular expression. The expression will be used to accurately or fuzzily identify column names. Multiple column names can be identified at the same time. Example: sex|gender.
    • If you select Remarks recognition, enter a custom regular expression. The expression will be used to fuzzily identify remarks. Example: .*comment.*.

Related Operations

  • Editing an identification rule: On the Data Identification Rules page, locate an identification rule and click Edit in the Operation column to change the security level, classification, and description of the identification rule. For a custom rule, you can also change the rule recognition and regular expression.
  • Editing the identification rule status: The identification rule is enabled by default. If the identification rule is disabled, it cannot be added to an identification rule group.

    To change the status of the identification rule, click or to enable or disable the rule.

  • Deleting identification rules: On the Data Identification Rules page, locate an identification rule and click Delete in the Operation column. To delete identification rules in a batch, select them and click Delete above the list.
    Note: Identification rules that have been referenced by identification rule groups or masking policies cannot be deleted. To delete such rules, modify the reference relationship first.

    The deletion operation cannot be undone. Exercise caution when performing this operation.

  • Testing preset rule templates: On the Preset Rule Templates tab page, you can view all preset rule templates and test the recognition result of the templates by entering custom sample data.