Updated on 2024-04-03 GMT+08:00

Discovering Sensitive Data

After creating a sensitive data identification rule group, you can create a sensitive data discovery task to discover sensitive data and synchronize it to Data Map.

After running a sensitive data discovery task, you must choose Sensitive Data Distribution in the left navigation pane, click the Manual Recovery tab, and ensure that the identification rule of the task is valid, so that the rule can take effect for dynamic masking tasks.

Prerequisites

  • Sensitive data identification rule groups have been created. For details, see Creating Identification Rule Groups.
  • A DWS connection, a DLI connection, and an MRS Hive connection have been created in Management Center based on Creating a Data Connection.
  • Before discovering DLI sensitive data, you must prepare a general-purpose DLI queue.
  • To enable automatic synchronization of identified sensitive data to the Data Map component, the sensitive data discovery task must be created, run, or scheduled by DAYU Administrator, Tenant Administrator, or data security administrator.
  • To enable the synchronization of sensitive data classifications to the Data Map component, ensure that the following prerequisites are met:
    • You have collected the metadata of the data table in DataArts Catalog. For details, see Metadata Collection Task.
    • Real-time metadata synchronization has been enabled for the data connections in Management Center. For details, see Creating a Data Connection.

Constraints

  • Sensitive data discovery is only available for standard warehouses of GaussDB(DWS), Data Lake Insight (DLI), and MRS Hive.
  • If the sensitive data identification rule is of the content identification type (that is, a built-in rule or a custom rule of the content identification type), a field is considered as a sensitive field and matched with a security level and classification only when the proportion of the number of records that match the identification rule of a field to the total number of records in the data table exceeds a specified threshold (80% by default).
  • During sensitive data identification, if a field matches multiple identification rules in an identification rule group, the highest security level of the identification rules is used as the security level of the field, and multiple field classifications are allowed.
  • After a sensitive data discovery task is executed, the security levels and classifications are generated for the discovered sensitive fields. By default, security levels of data tables are not generated. Security levels of data tables are generated only if you select Update the security level. The security level of a data table is the highest security level of sensitive fields.
  • Currently, sensitive data can be synchronized only to Data Map. Sensitive data cannot be synchronized to DataArts Catalog, and sensitive data security levels and classifications cannot be added or edited in DataArts Catalog.
  • Only the DAYU Administrator, Tenant Administrator, or data security administrator has the permission to enable automatic synchronization of sensitive data to Data Map or manually synchronize sensitive data to Data Map.
    • Automatic synchronization: If Manually synchronize the recognition result is not selected during the creation of a sensitive data discovery task, sensitive data is automatically synchronized to Data Map.
    • Manual synchronization: If you select Manually synchronize the recognition result when creating a sensitive data discovery task, you need to choose Sensitive Data Distribution and click the Manual Recovery tab to synchronize sensitive data to Data Map.

    When creating a sensitive data discovery task as a common user other than the DAYU Administrator, Tenant Administrator, or data security administrator, you must select Manually synchronize the recognition result so that the task can be successfully created. In addition, if you run or schedule a task for which Manually synchronize the recognition result is not selected as a common user, the task cannot be executed.

Creating a Sensitive Data Discovery Task

  1. On the DataArts Studio console, locate an instance and click Access. On the displayed page, locate a workspace and click DataArts Security.

    Figure 1 DataArts Security

  2. Choose Sensitive Data Discovery from the left navigation bar.

    Figure 2 Sensitive Data Discovery page

  3. Click Create. In the Create Sensitive Task slide-out panel, set parameters based on Table 1.

    Figure 3 Setting parameters for the sensitive data discovery task

    The following table lists the parameters for a sensitive data discovery task.
    Table 1 Parameters

    Parameter

    Description

    Basic Settings

    *Task

    Name of the task. To facilitate task management, you are advised to include the data table to be identified and the rule group to be used in the task name.

    Task Description

    A description of the task to be created.

    *Data Source

    Select a created data source from the drop-down list.

    *Data Connection

    Select a data connection from the drop-down list.

    If no data connection is available, create one by referring to Creating a Data Connection.

    *Database

    Databases and data tables where you want to discover sensitive data.

    • Click Configure following the Database box to select databases.
    • Click Configure following the Data Table box to select data tables.
    • Click Clear to delete the selected databases and data tables.

    *Data Table

    *Computing Queue

    This parameter is mandatory if Data Source is set to DLI. Select a general-purpose DLI queue for executing DLI jobs.

    Rule Settings

    *Recognize Rule Group

    Select a rule group from the drop-down list. If no rule groups are created, create one by referring to Creating Identification Rule Groups.

    When you select a group, details about the identification rules in the group are displayed. You can configure thresholds for preset rules and custom rules that contain content matching. When the proportion of the number of records that match the identification rule of a field to the total number of records in the data table exceeds the threshold (80% by default), the field is considered sensitive. If different rule groups contain the same rule, the threshold for the rule must be the same.

    Update the security level

    After the sensitive data discovery task is executed, the security levels and classifications are generated for the identified sensitive fields. By default, this option is not selected, indicating that the security levels of data tables are not generated.

    If this option is selected, the security levels of data tables are generated. The security level of a data table is the highest security level of the sensitive fields.

    Manually synchronize the recognition result

    Only the DAYU Administrator, Tenant Administrator, or data security administrator has the permission to enable automatic synchronization of sensitive data to Data Map or manually synchronize sensitive data to Data Map.
    • Automatic synchronization: If Manually synchronize the recognition result is not selected during the creation of a sensitive data discovery task, sensitive data is automatically synchronized to Data Map.
    • Manual synchronization: If you select Manually synchronize the recognition result when creating a sensitive data discovery task, you need to choose Sensitive Data Distribution and click the Manual Recovery tab to synchronize sensitive data to Data Map.

    When creating a sensitive data discovery task as a common user other than the DAYU Administrator, Tenant Administrator, or data security administrator, you must select Manually synchronize the recognition result so that the task can be successfully created. In addition, if you run or schedule a task for which Manually synchronize the recognition result is not selected as a common user, the task cannot be executed.

    Schedule Properties

    Once

    The sensitive data discovery task runs only once.

    On Schedule

    The sensitive data discovery task runs based on the configured scheduling period.

    • Date

      Period during which the task takes effect

    • Cycle

      The frequency at which a task is executed. The options are:

      • minutes: Select the scheduling start time and end time, and set the interval in minutes.
      • hours: Select the scheduling start time and end time, and set the interval in hours.
      • Day: Set the scheduling time everyday.
      • Week: Select a day in a week and set the specific time to start scheduling.
      • Month: Select a day in a month and set the specific time to start scheduling.

      For example, you can set Cycle to Week, Time to 15:52, and Time Range to Tuesday. In this case, the task is executed at 15:52 every Tuesday within the configured date range.

    • Start now: If you select this option, the task is scheduled immediately.

  4. Click OK. The sensitive data discovery task is created.

    If no execution result is displayed after the sensitive data discovery task is successfully executed, and no matched information is found in the run log, it means no sensitive data is discovered.

Related Operations

  • Running or scheduling a task: On the Sensitive Data Discovery page, locate a task and click Run in the Operation column or click More in the Operation column and select Start.

    You can determine whether a task is scheduled once or repeatedly based on the scheduling period.

    If you run or schedule a task for which Manually synchronize the recognition result is not selected as a common user other than the DAYU Administrator, Tenant Administrator, or data security administrator, the task fails to be executed. Only the DAYU Administrator, Tenant Administrator, or data security administrator can run or schedule tasks for which Manually synchronize the recognition result is not selected.

  • Editing a task: On the Sensitive Data Discovery page, locate a task and click Edit in the Operation column.

    A task in the Running state cannot be edited.

  • Deleting tasks: On the Sensitive Data Discovery page, locate a task, click More in the Operation column, and select Delete. To delete multiple tasks at a time, select the tasks and click Delete above the task list.

    A task in the Running state cannot be deleted.

    • Deleting a sensitive data discovery task will delete the discovery result. Exercise caution when performing this operation.
    • The deletion operation cannot be undone. Exercise caution when performing this operation.
  • Viewing running instance logs: On the Sensitive Data Discovery page, locate a task and click to expand instances. Click Operation and select View Log.

    If a task fails to be executed, you can locate the failure cause based on logs, rectify the fault, and try the task again. If the fault persists, contact technical support.