Precautions for Transparent Encryption

HBase

After transparent encryption is configured, when the BulkLoad tool is used to generate and import HFiles, the HFile path must be specified to a subdirectory of /HBase root directory/extdata (for example, /hbase/extdata/bulkloadTmp/hfile). In addition, when the BulkLoad command is executed: The HBase user must be added to the Hadoop user group of the corresponding cluster and have the read permission on the encrypted key of the HBase root directory. For details about permission control, see Permission Control.

If the cluster is not the first one installed on FusionInsight Manager, the user group is c<cluster ID>_hadoop, for example, c2_hadoop.

Example:

hbase com.xxx.hadoop.hbase.tools.bulkload.ImportData -Dimport.skip.bad.lines=true -Dimport.separator=',' -Dimport.bad.lines.output=/hbase/extdata/bulkloadTmp/badline -Dimport.hfile.output=/hbase/extdata/bulkloadTmp/hfile configuration.xml ImportTable /datadirImport

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /hbase/extdata/bulkloadTmp/hfile ImportTable

-Dimport.hfile.output must be specified to a subdirectory of /HBase root directory/extdata.

Hive

If hive.server2.enable.doAs of Hive is set to true, you need to grant the KMS access permission to the user.
If transparent encryption is used, data replication should be avoided when moving data. Some suggestions are provided as follows:
- When the load command is executed across encryption zones to load data, data replication is performed. When the data volume is large, the performance deteriorates. Therefore, it is recommended that the source path and target path be in the same encryption zone.
- You are advised to configure the directory corresponding to hive.exec.scratchdir to be in the same encryption zone as the data warehouse directory to avoid data replication.
- You are advised to retain the value of hive.exec.stagingdir to avoid data replication.

Restrictions on Hive transparent encryption:
- Locations cannot be modified across encrypted tables or partitions.
- You cannot change the location of an encrypted table or partition to that of a non-encrypted table or partition, or change the location of a non-encrypted table or partition to that of an encrypted table or partition.
- When the recycle bin function is enabled, data in the non-encrypted zone cannot be moved to the encrypted recycle bin.
- Backup and restoration across encryption zones are not supported.
- Dynamic partitions cannot be written across encryption zones.

Application scenarios of Hive transparent encryption:
- Data can be written to tables or partitions across encryption zones (insert/load).
- Non-encrypted tables or partitions can be written to encryption zones. Reverse writing to tables or partitions (insert/load) is also supported.
- Join query can be performed across encryption zones and for encrypted and non-encrypted tables.
- When the recycle bin function is enabled, tables in an encryption zone can be moved to the recycle bin in a non-encryption zone.
- Tables in an encryption zone can be backed up to non-encrypted directories. Tables in the non-encryption zones can be restored to directories in encryption zones.
  Example:
  
  The encryption keys of the target table test and data source /tmp/zone1/zone/test.txt are key1 and key2, respectively. Run the following command to load (overwrite) data from the data source to the test table:
  
  load data local inpath '/tmp/zone1/zone/test.txt' overwrite into table test;

Spark

If transparent encryption is used, data replication should be avoided when moving data. Some suggestions are provided as follows:
When the load command is executed across encryption zones to load data, data replication is performed. When the data volume is large, the performance deteriorates. Therefore, it is recommended that the source path and target path be in the same encryption zone.

When accessing encrypted data, add the native library to the local configuration.

For example, when the HiBench tool is used, add the following configuration to the conf/spark.conf configuration file of the tool:

spark.driver.extraLibraryPath = /opt/client/Spark/spark/native
spark.yarn.cluster.driver.extraLibraryPath = ${BIGDATA_HOME}/FusionInsight_HD_8.1.0.1/install/FusionInsight-Hadoop-3.1.1/hadoop/lib/native
spark.executor.extraLibraryPath = ${BIGDATA_HOME}/FusionInsight_HD_8.1.0.1/install/FusionInsight-Hadoop-3.1.1/hadoop/lib/native