Updated on 2024-10-25 GMT+08:00

Optimizing Hive OCR Data Storage

Scenario

ORC is an efficient column storage format and has higher compression ratio and reading efficiency than other file formats.

You are advised to use ORC as the default Hive table storage format.

Prerequisites

You have logged in to the Hive client. For details, see Using the Hive Client.

Procedure

  • Recommended: SNAPPY compression, which applies to scenarios with even compression ratio and reading efficiency requirements.

    Create table xx (col_name data_type) stored as orc tblproperties ("orc.compress"="SNAPPY");

  • Available: ZLIB compression, which applies to scenarios with high compression ratio requirements.

    Create table xx (col_name data_type) stored as orc tblproperties ("orc.compress"="ZLIB");

xx indicates the specific Hive table name.