Updated on 2023-10-31 GMT+08:00

Data Redaction

GaussDB(DWS) provides the column-level dynamic data masking (DDM) function. For sensitive data (such as the ID card number, mobile number, and bank card number), the DDM function is used to redact the original data to protect data security and user privacy.

  • Creating a data masking policy for a table

    GaussDB(DWS) uses the CREATE REDACTION POLICY syntax to create a data masking policy on a table. (MASK_NONE: Do not perform masking. MASK_FULL: Mask data into a fixed value. MASK_PARTIAL: Perform partial masking based on the character type, numeric type, or time type.)

  • Modifying the data masking policy of a table

    The ALTER REDACTION POLICY syntax is used to modify the expression for enabling a masking policy, rename a masking policy, and add, modify, or delete masked columns.

  • Deleting the masking policy of a table

    The DROP REDACTION POLICY syntax is used to delete the masking function information of a masking policy on all columns of a table.

  • Viewing the masking policy and masked columns

    Redaction policy information is stored in the system catalog PG_REDACTION_POLICY, and redacted column information is stored in the system catalog PG_REDACTION_COLUMN. You can view information about the redaction policy and redacted columns in the system views REDACTION_POLICIES and REDACTION_COLUMNS.

  • Generally, you can run the SELECT statement to view the data redaction result. If a statement has the following features, sensitive data may be deliberately obtained. In this case, an error will be reported during statement execution.
    • The GROUP BY clause references the Target Entry containing redaction columns as the target column.
    • DISTINCT works on the output redaction columns.
    • The statement contains CTE.
    • Operations on sets are involved.
    • The target columns of a subquery are not redaction columns of the base table, but the expressions or function calls for redaction columns of the base table.
  • You can use COPY TO or GDS to export the redacted data. Due to the irreversibility of the data redaction, secondary redaction of the data is meaningless.
  • Do not set target columns of UPDATE, MERGE INTO, and DELETE statements to redaction columns.
  • The UPSERT statement allows you to insert update data through EXCLUDED. If data in the base table is updated by referencing redaction columns, the data may be modified by mistake. As a result, an error will be reported during the execution.

Examples

The following uses the employee table emp, table owner alice, and roles matu and july as an example to illustrate the data masking process. The emp table contains private data such as the employee name, mobile number, email address, bank card number, and salary.

  1. After connecting to the database as the administrator, create roles alice, matu, and july.

  2. Grant schema permissions on the current database to alice, matu, and july.

  3. Switch to role alice, create the emp table, and insert three pieces of employee information.

  4. alice grants the read permission on the emp table to matu and july.

  5. Create the masking policy mask_emp: Only user alice can view all employee information. User matu and july cannot view employee bank card numbers and salary data. The card_no column is of the numeric type and all of its data is masked into 0 by the MASK_FULL function. The card_string column is of the character type and part of its data is masked by the MASK_PARTIAL function based on the specified input and output formats. The salary column is of the numeric type and the MASK_PARTIAL function is used to mask all digits before the penultimate digit using the number 9.

  6. Switch to matu and july and view the employee table emp.

  7. If you want matu to have the permission to view all employee information, but do not want july to have. In this case, you only need to modify the effective scope of the policy.

  8. Switch to users matu and july and view the emp table again, respectively.

  9. The information in the phone_no, email, and birthday columns is private data. Update redaction policy mask_emp and add three redaction columns.

  10. Switch to july and view data in the emp table.

  11. Query redaction_policies and redaction_columns to view details about the current redaction policy mask_emp.

  12. Add the salary_info column. To replace the salary information in text format with *.*, you can create a user-defined redaction function. In this step, you can use the PL/pgSQL to define the redaction function mask_regexp_salary. To create a redaction column, you simply need to customize the function name and parameter list. For details, see User-Defined Functions.

  13. If there is no need to set a redaction policy for the emp table, delete redaction policy mask_emp.