Iceberg Hidden Partitioning and Partition Evolution
Hidden partitioning is an automatic partitioning mechanism built into Iceberg. A table automatically calculates partitions based on original fields (such as timestamps) using built-in functions. You do not need to manually maintain partition columns. In this way, query logic and physical storage are decoupled, and you can directly modify partitioning rules without migrating data.
Typical application scenarios:
- Log, traces, and time series data is partitioned by day, hour, or month.
- Multi-dimensional partitioning (time and level, type, or business line).
- The partition policy of a table needs to be modified, but data does not need to be rewritten.
- Multi-person collaboration, preventing manual partitioning fields from being incorrectly written.
The core of Iceberg hidden partitioning is to automatically calculate partitions based on original fields. You do not need to maintain physical partition columns. Partition tailoring can greatly improve query efficiency and decouple storage from logic. The partitioning scheme can be flexibly evolved without data migration.
Hidden Partition Transformations
Table 1 lists the hidden partition transformations.
- All transforms must return null for a null input value.
- The void transform may be used to replace the transform in an existing partition field so that the field is effectively dropped in v1 tables. For details, see Partition Evolution.
- bucket[N]: Applies a hash function (typically Iceberg-specific) to the source value, then computes the modulus with N to assign a bucket number from 0 to N–1.
Different data types (for example, int and string) produce different hash values, even for logically equivalent values like 34 and "34". As a result, direct type upgrades are not supported.
- truncate[W]
- For numeric types (int, long, and decimal), values are truncated to the nearest multiple of W. For example, truncate[10] converts 17 to 10.
- For string and binary types, values are truncated to the first W characters or bytes. For example, truncate[3] converts "iceberg" to "ice".
For details about SQL statements, see PARTITIONED BY.
| Transform Name | Description | Source Type | Result Type |
|---|---|---|---|
| identity | Source value, which is unmodified. | Any except for geometry, geography, and variant | Source type |
| bucket[N] | Hash of value and mod N (see below) | int, long, decimal, date, time, timestamp, timestamptz, timestamp_ns, timestamptz_ns, string, uuid, fixed, binary | int |
| truncate[W] | Value truncated to width W (see below) | int, long, decimal, string, binary | Source type |
| year | Extract a date or timestamp year, as years from 1970 | date, timestamp, timestamptz, timestamp_ns, timestamptz_ns | int |
| month | Extract a date or timestamp month, as months from 1970-01-01 | date, timestamp, timestamptz, timestamp_ns, timestamptz_ns | int |
| day | Extract a date or timestamp day, as days from 1970-01-01 | date, timestamp, timestamptz, timestamp_ns, timestamptz_ns | int |
| hour | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | timestamp, timestamptz, timestamp_ns, timestamptz_ns | int |
| void | Always produces null | Any | Source type or int |
Partition Evolution
Table partitioning can be evolved by adding, removing, renaming, or reordering partition spec fields.
Changing a partition spec produces a new spec identified by a unique spec ID that is added to the table's list of partition specs and may be set as the table's default spec.
Evolution rules:
- When evolving a spec, changes should not cause partition field IDs to change because the partition field IDs are used as the partition tuple field IDs in manifest files.
- In v2, partition field IDs must be explicitly tracked for each partition field. New IDs are assigned based on the last assigned partition ID in table metadata.
- In v1, partition field IDs were not tracked, but were assigned sequentially starting at 1000 in the reference implementation. This assignment caused problems when reading metadata tables based on manifest files from multiple specs because partition fields with the same ID may contain different data types. For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables:
- Do not reorder partition fields.
- Do not drop partition fields. Instead replace the field's transform with the void transform.
- Only add partition fields at the end of the previous partition spec.
For details about SQL statements, see ALTER TABLE ADD/REPLACE/DROP PARTITION FIELD.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot