Updated on 2024-12-13 GMT+08:00

Configuring Hive Dynamic Data Masking

Scenarios

Enabling Hive dynamic masking allows for the utilization of data within the masked column for computations, while keeping it concealed during the output of calculation results. The cluster's masking policy is dynamically transferred in accordance with lineage relationships, optimizing data utility while safeguarding privacy.

Constraints

  • Data masking is not available for Hudi tables.
  • Masking for direct HDFS read/write operations is not supported.
  • Masking for complex data types like arrays, maps, and structs is not supported.
  • Custom masking policies support only the string type, and the values are masked by ***.
  • In instances where the masking policy transfer results in a conflict with an existing policy on the target table, the latter's policy will be overridden as Custom:"***".
  • For simple queries that are not submitted in a YARN job, the masking result complies with the masking policy configured on Ranger. With the customer-type masking policy, data is masked by ***. Simple queries include select * from Table name; and select * from Table name limit xxx;.
  • For complex queries that are submitted in a Yarn job, string fields are masked in compliance with the masking policy configured on Ranger. Other types are masked based on the Nullify masking policy.

Configuring Hive Dynamic Data Masking

  1. Log in to FusionInsight Manager, choose Cluster > Services > Hive, and click Configurations. Search for hive.dynamic.masked.enabled and change the value of this parameter for the HiveServer instance to true.
  2. Click Save. Click the Instances tab, select all HiveServer instances, click More > Restart Instance, enter the user password, and click OK to restart all HiveServer instances.
  3. Log in to the node where the Hive client is installed as the client installation user and run the following commands:

    cd Client installation directory

    source bigdata_env

    source Hive/component_env

    kinit Component service user (skip this step if Kerberos authentication is disabled for the cluster (the cluster is in normal mode))

  4. Log in to the Hive client and create a Hive table.

    beeline

    create table hivetest(a int, b string);

    insert into hivetest values (1,"test01"), (2,"test02");

  5. Configure a masking policy for field b in the hivetest table by referring to Hive Data Masking and check the masking result.

    select * from hivetest;

    If the following information is displayed, data masking is successful.

    Figure 1 Successful data masking

  6. Verify the transferability of the masking policy.

    create table hivetest02 as select * from hivetest;

    Wait for approximately 1 minute, the Ranger policy is synchronized to the Hive component. Check whether the masking policy is transferred.

    select * from hivetest02;

    If the following information is displayed, the masking policy is successfully transferred.

    Figure 2 Masking policy transferred

  7. If the new table data in 6 is successfully masked, the dynamic masking configuration has been applied. Log in to the Ranger management page as user rangeradmin, click Hive in the HADOOP SQL area on the home page, and click the Masking tab. Check the automatically generated masking policy of the table. For example, the masking policy of table hivetest02 is policy_synced_from_table_default_hivetest02_b.

    Figure 3 Masking policy of the table