Updated on 2025-08-22 GMT+08:00

Configuring the Hive Column Encryption

Scenario

Hive supports encryption of one or multiple columns in a table. When creating a Hive table, you can specify the column to be encrypted and encryption algorithm. Insert statement can be used to encrypt data in a column.

Hive supports two column encryption algorithms, which need to be specified during table creation.

  • AES: The encryption class is org.apache.hadoop.hive.serde2.AESRewriter.
  • SM4: also called SMS4. Its encryption class is org.apache.hadoop.hive.serde2.SMS4Rewriter.

When you import data from a common Hive table into a Hive column encryption table, delete the original data from the common Hive table as long as doing this does not affect other services. Retaining an unencrypted table poses security risks.

Notes and Constraints

Columns can be encrypted only for Hive tables in TextFile and SequenceFile formats stored in HDFS. Columns cannot be encrypted for views and Hive over HBase tables.

Prerequisites

  • The cluster client has been installed. For details about how to install the client, see Installing a Client.
  • A Hive service user has been created and granted the permission to create Hive tables. For example, the user has been added to the hive (primary group) and hadoop user groups. For details about how to create a Hive user, see Creating a Hive User and Binding the User to a Role.

Creating a Hive Table with Encrypted Columns

  1. Log in to the node where the client is installed as the client installation user.
  2. Go to the client installation directory, configure environment variables, and authenticate the user.

    Go to the client installation directory.

    cd Client installation directory

    Load the environment variables.

    source bigdata_env

    Authenticate the user. If Kerberos authentication is not enabled for the cluster, skip this step.

    kinit Hive service user

  3. Log in to the Hive client.

    beeline

  4. Specify the column to be encrypted and encryption algorithm when creating a table.

    create table <[db_name.]table_name> (<col_name1> <data_type> ,<col_name2> <data_type>,<col_name3> <data_type>,<col_name4> <data_type>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='<col_name2>,<col_name3>', 'column.encode.classname'='org.apache.hadoop.hive.serde2.AESRewriter')STORED AS TEXTFILE;

    Alternatively, use the following statement:

    create table <[db_name.]table_name> (<col_name1> <data_type> ,<col_name2> <data_type>,<col_name3> <data_type>,<col_name4> <data_type>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='1,2', 'column.encode.classname'='org.apache.hadoop.hive.serde2.SMS4Rewriter') STORED AS TEXTFILE;

    Example:

    create table test1 (id string,name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='id,name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.AESRewriter')STORED AS TEXTFILE;

    In the preceding command:

    • column.encode.classname indicates the encryption algorithm. The value can be org.apache.hadoop.hive.serde2.SMS4Rewriter or org.apache.hadoop.hive.serde2.AESRewriter.
    • The numbers used to specify encryption columns start from 0. 0 indicates column 1, 1 indicates column 2, and so on.
    • When creating a table with encrypted columns, ensure that the directory where the table resides is empty.

  5. Insert data into the table using the insert statement.

    Assume that the test table already exists and contains data. For example, the table data is as follows:

    Figure 1 Data in the test table

    Import data.

    insert into table <table_name> select <col_list> from test;

    Example:

    insert into table test1 select id,name from test;

  6. View the detailed metadata of the encrypted table.

    describe formatted test1;

    In the command output, you can view the encrypted columns and encryption type. For example, in Figure 2, the encrypted column is id,name, and the encryption class is org.apache.hadoop.hive.serde2.AESRewriter.

    Figure 2 Viewing the encrypted table

  7. Exit the Hive client.

    !q

  8. View the HDFS file of the encrypted Hive table.

    hdfs dfs -cat HDFS file path

    Example:

    hdfs dfs -cat /user/hive/warehouse/test/000000_0

    If the command output shown in Figure 3 is displayed, the Hive column encryption function is configured.

    Figure 3 Hive column data encryption