Updated on 2024-10-08 GMT+08:00

Configuring the Hive Column Encryption

Scenario

Hive supports encryption of one or multiple columns in a table. When creating a Hive table, you can specify the column to be encrypted and encryption algorithm. When data is inserted into the table using the insert statement, the related columns are encrypted. Column encryption can be performed in HDFS tables of only the TextFile and SequenceFile file formats. The Hive column encryption does not support views and the Hive over HBase scenario.

Hive supports two column encryption algorithms, which can be specified during table creation:

  • AES (the encryption class is org.apache.hadoop.hive.serde2.AESRewriter)
  • SMS4 (the encryption class is org.apache.hadoop.hive.serde2.SMS4Rewriter)

After importing data from a common Hive table to a Hive column encryption table, delete the original data from the common Hive table as long as doing this does not affect other services. Retaining an unencrypted table poses security risks.

Procedure

  1. Specify the column to be encrypted and encryption algorithm when creating a table.

    create table<[db_name.]table_name> (<col_name1> <data_type> ,<col_name2> <data_type>,<col_name3> <data_type>,<col_name4> <data_type>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='<col_name2>,<col_name3>', 'column.encode.classname'='org.apache.hadoop.hive.serde2.AESRewriter')STORED AS TEXTFILE;

    Alternatively, use the following statement:

    create table <[db_name.]table_name> (<col_name1> <data_type> ,<col_name2> <data_type>,<col_name3> <data_type>,<col_name4> <data_type>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='1,2', 'column.encode.classname'='org.apache.hadoop.hive.serde2.SMS4Rewriter') STORED AS TEXTFILE;

    • The numbers used to specify encryption columns start from 0. 0 indicates column 1, 1 indicates column 2, and so on.
    • When creating a table with encrypted columns, ensure that the directory where the table resides is empty.

  2. Insert data into the table using the insert statement.

    Assume that the test table exists and contains data.

    insert into table <table_name> select <col_list> from test;