Iceberg Hidden Partitioning and Partition Evolution

Hidden partitioning is an automatic partitioning mechanism built into Iceberg. A table automatically calculates partitions based on original fields (such as timestamps) using built-in functions. You do not need to manually maintain partition columns. In this way, query logic and physical storage are decoupled, and you can directly modify partitioning rules without migrating data.

Typical application scenarios:

Log, traces, and time series data is partitioned by day, hour, or month.
Multi-dimensional partitioning (time and level, type, or business line).
The partition policy of a table needs to be modified, but data does not need to be rewritten.
Multi-person collaboration, preventing manual partitioning fields from being incorrectly written.

The core of Iceberg hidden partitioning is to automatically calculate partitions based on original fields. You do not need to maintain physical partition columns. Partition tailoring can greatly improve query efficiency and decouple storage from logic. The partitioning scheme can be flexibly evolved without data migration.

Hidden Partition Transformations

Table 1 lists the hidden partition transformations.

All transforms must return null for a null input value.
The void transform may be used to replace the transform in an existing partition field so that the field is effectively dropped in v1 tables. For details, see Partition Evolution.

bucket[N]: Applies a hash function (typically Iceberg-specific) to the source value, then computes the modulus with N to assign a bucket number from 0 to N–1.
Different data types (for example, int and string) produce different hash values, even for logically equivalent values like 34 and "34". As a result, direct type upgrades are not supported.
truncate[W]
- For numeric types (int, long, and decimal), values are truncated to the nearest multiple of W. For example, truncate[10] converts 17 to 10.
- For string and binary types, values are truncated to the first W characters or bytes. For example, truncate[3] converts "iceberg" to "ice".
For details about SQL statements, see PARTITIONED BY.

**Table 1** Hidden partition transformations
Transform Name	Description	Source Type	Result Type
identity	Source value, which is unmodified.	Any except for geometry, geography, and variant	Source type
bucket[N]	Hash of value and mod N (see below)	int, long, decimal, date, time, timestamp, timestamptz, timestamp_ns, timestamptz_ns, string, uuid, fixed, binary	int
truncate[W]	Value truncated to width W (see below)	int, long, decimal, string, binary	Source type
year	Extract a date or timestamp year, as years from 1970	date, timestamp, timestamptz, timestamp_ns, timestamptz_ns	int
month	Extract a date or timestamp month, as months from 1970-01-01	date, timestamp, timestamptz, timestamp_ns, timestamptz_ns	int
day	Extract a date or timestamp day, as days from 1970-01-01	date, timestamp, timestamptz, timestamp_ns, timestamptz_ns	int
hour	Extract a timestamp hour, as hours from 1970-01-01 00:00:00	timestamp, timestamptz, timestamp_ns, timestamptz_ns	int
void	Always produces null	Any	Source type or int

Partition Evolution

Table partitioning can be evolved by adding, removing, renaming, or reordering partition spec fields.

Changing a partition spec produces a new spec identified by a unique spec ID that is added to the table's list of partition specs and may be set as the table's default spec.

Evolution rules:

When evolving a spec, changes should not cause partition field IDs to change because the partition field IDs are used as the partition tuple field IDs in manifest files.
In v2, partition field IDs must be explicitly tracked for each partition field. New IDs are assigned based on the last assigned partition ID in table metadata.
In v1, partition field IDs were not tracked, but were assigned sequentially starting at 1000 in the reference implementation. This assignment caused problems when reading metadata tables based on manifest files from multiple specs because partition fields with the same ID may contain different data types. For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables:
- Do not reorder partition fields.
- Do not drop partition fields. Instead replace the field's transform with the void transform.
- Only add partition fields at the end of the previous partition spec.
For details about SQL statements, see ALTER TABLE ADD/REPLACE/DROP PARTITION FIELD.

Parent topic: Functions and Table Configurations Supported by Iceberg

Previous topic: Data Types Supported by Iceberg Tables

Next topic: Using Iceberg Based on Spark

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot