Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Troubleshooting/ Using Hive/ NULL Is Displayed When Data Is Inserted After the Partition Column Is Added
Updated on 2024-11-29 GMT+08:00

NULL Is Displayed When Data Is Inserted After the Partition Column Is Added

Symptom

  1. Run the following command to create a table:
    create table test_table(
    col1 string,
    col2 string
    )
    PARTITIONED BY(p1 string)
    STORED AS orc tblproperties('orc.compress'='SNAPPY');
  2. Modify the table structure, add partitions, and insert data.
    alter table test_table add partition(p1='a');
    insert into test_table partition(p1='a') select col1,col2 from temp_table;
  3. Modify the table structure, add columns, and insert data.
    alter table test_table add columns(col3 string);
    insert into test_table partition(p1='a') select col1,col2,col3 from temp_table;
  4. Query data in the test_table table. In the returned result, the values in the col3 column are all NULL.
    select * from test_table where p1='a' 
  5. Add a table partition and insert data.
    alter table test_table add partition(p1='b');
    insert into test_table partition(p1='b') select col1,col2,col3 from temp_table; 
  6. Query data in the test_table table. In the returned result, the value of col3 is not all NULL.
    select * from test_table where p1='b'

Cause Analysis

RESTRICT is the default option for altering a table. In the RESTRICT mode, only the metadata is changed, while the table's partition structure created before the altering operation remains unchanged. However, new partitions created after the altering operation are changed. Therefore, when values of the old partitions are queried, they are all NULL.

Procedure

Add the cascade keyword when adding columns, for example:
alter table test_table add columns(col3 string) cascade;