表达式分区
在处理大规模数据集时,数据分区是提高查询性能的关键技术之一。然而,传统的分区方式需要用户手动定义分区逻辑,这不仅增加了用户的负担,还可能导致分区策略不够灵活。表达式分区用于支持Iceberg的隐藏分区功能,其核心设计是将“分区逻辑”从用户视角隐藏,交给Iceberg元数据自动管理。如何确保数据分区既高效又灵活?通过使用表达式分区,用户可以无需关心复杂的分区逻辑,Iceberg将自动根据数据特征和查询模式优化分区策略,从而提高查询性能和数据管理效率。
约束限制
- 最多支持4个分区列。
- 不支持void表达式。
- 所有分区表达式不支持复杂数据类型。
- truncate/bucket表达式不支持二进制数据。
- INSERT INTO PARTITION功能仅支持identity表达式,不支持其他表达式。
- Iceberg约束:同一个表列,不能出现在year/month/day/hour两个时间表达式中。 但是,可以同时出现在时间和bucket/truncate表达式中。
- Iceberg约束:对分区键有默认命名规则,不允许与表的列名称与之冲突。
- 当前不支持分区演进功能。
支持的分区表达式
FabricSQL要支持分区表达式以及对应数据类型列表如下:
|
表达式名称 |
支持的数据类型 |
计算结果的类型 |
说明 |
|---|---|---|---|
|
identity |
smallint, int, bigint, long, numeric, date, timestamp, text, varchar, bpchar |
原数据类型 |
不支持float,double,boolean, bytea。 |
|
bucket[N] |
smallint, int, bigint, long, numeric, date, timestamp text, bpchar, varchar |
int |
|
|
truncate[W] |
smallint, int, bigint, long,numeric, text, bpchar, varchar |
原数据类型 |
|
|
year |
date, timestamp |
int |
例:year('2025-10-10') 计算结果为 '2025'年。 |
|
month |
date, timestamp |
int |
例:month('2025-10-10') 计算结果为'2025-10'月。 |
|
day |
date, timestamp |
int |
例:day('2025-10-10') 计算结果为'2025-10-10'日。 |
|
hour |
timestamp |
int |
例:hour('2025-10-10 10:10:10.000') 计算结果为'2025-10-10 10:00:00'时。 |
|
void |
不支持 |
不支持 |
本质上是将所有数值映射到null值上。 |
所有表达式对于null值的处理结果必须返回null值。
identity表达式对输入的数值不做任何处理,直接输出原数值。
year表达式计算的是相对1970年、以年为单位的数值。month表达式计算的是相对1970-01-01、以月为单位的数值。day表达式计算的是相对1970-01-01、以天为单位的数值。hour表达式计算的是相对1970-01-01 00:00:00、以小时为单位的数值。Iceberg在记录分区值结果时,并不会直接记录整数数值,而是将它转换为可阅读的时间字符串。例如,Iceberg会将year('2025-10-10') 表达式计算出来的分区值记录为'2025'年,而不是整数55。
FabricSQL提供了SQL函数fq_iceberg_year()、fq_iceberg_month()、fq_iceberg_day()、fq_iceberg_hour()、fq_iceberg_truncate()、fq_iceberg_bucket()用于实现对应分区表达式year、month、day、hour、truncate、bucket的计算功能。
分区表达式:truncate
对于不同的数据类型,truncate表达式的计算方法不同,可以使用函数fq_iceberg_truncate() 进行运算求值。
对于整数类型,truncate(W, integer_value)返回的是不超过integer_value、且为W倍数的最大数值。 示例如下:
fabricsql=>select fq_iceberg_truncate (10, -2) ;
fq_iceberg_truncate
---------------------
-10
(1 row)
fabricsql=> select fq_iceberg_truncate (10, 0) ;
fq_iceberg_truncate
---------------------
0
(1 row)
fabricsql=> select fq_iceberg_truncate (10, 9) ;
fq_iceberg_truncate
---------------------
0
(1 row)
fabricsql=> select fq_iceberg_truncate (10, 10) ;
fq_iceberg_truncate
---------------------
10
(1 row)
fabricsql=> select fq_iceberg_truncate (10, 19) ;
fq_iceberg_truncate
---------------------
10
(1 row)
对于字符串类型,truncate(W, string_value)返回的是string_value字符串中前缀长度为W的前缀子字符串。注意,长度以字符为单位。 示例如下:
fabricsql=> select *, length(t) from fq_iceberg_truncate (2, '') as t;
iceberg_truncate | length
------------------+--------
| 0
(1 row)
fabricsql=> select *, length(t) from fq_iceberg_truncate (2, 'a') as t;
iceberg_truncate | length
------------------+--------
a | 1
(1 row)
fabricsql=> select *, length(t) from fq_iceberg_truncate (2, 'tt') as t;
iceberg_truncate | length
------------------+--------
tt | 2
(1 row)
fabricsql=> select *, length(t) from fq_iceberg_truncate (2, 'cow') as t;
iceberg_truncate | length
------------------+--------
co | 2
(1 row)
fabricsql=> select *, length(t) from fq_iceberg_truncate (2, '你好中国!') as t;
iceberg_truncate | length
------------------+--------
你好 | 2
(1 row)
对于numeric类型,truncate(W, num_value)返回的是不超过num_value、且小数点右移scale位后为W倍数的最大数值。示例如下:
fabricsql=> select fq_iceberg_truncate (50, 10.64::numeric(15,3));
fq_iceberg_truncate
---------------------
10.600
(1 row)
fabricsql=> select fq_iceberg_truncate (50, 10.66::numeric(15,3));
fq_iceberg_truncate
---------------------
10.650
(1 row)
fabricsql=> select fq_iceberg_truncate (50, -10.66::numeric(15,3));
fq_iceberg_truncate
---------------------
-10.700
(1 row)
创建表达式分区表
CREATE TABLE语法详情参考CREATE TABLE。
创建一个表达式分区的Iceberg managed表SQL语句如下:
fabricsql=> set current_schema to test;
fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), truncate(10, id), c3 int comment 'this is c3 field', year(ts) ) store as iceberg;
CREATE TABLE
fabricsql=> select * from pg_get_partition_keys('test', 'iceberg_table');
table | partition_key
---------------+-----------------
iceberg_table | bucket(16,val)
iceberg_table | truncate(10,id)
iceberg_table | c3
iceberg_table | year(ts)
(4 rows)
在示例中, c3是一个identity分区键。此外,还创建了bucket、truncate、year表达式的三个分区键。Iceberg会对分区键进行内部命名,其命名规则如下表所述。
|
表达式 |
默认命名规则 |
样例 |
|---|---|---|
|
year(source_column) |
<source_column>_year |
year(ts) -> 'ts_year' |
|
month(source_column) |
<source_column>_month |
month(ts) -> 'ts_month' |
|
day(source_column) |
<source_column>_day |
day(ts) -> 'ts_day' |
|
hour(source_column) |
<source_column>_hour |
hour(ts) -> 'ts_hour' |
|
bucket(N, source_column) |
<source_column>_bucket |
bucket(ts) -> 'ts_bucket' |
|
truncate(W, source_column) |
<source_column>_trunc |
truncate(ts) -> 'ts_trunc' |
|
identity(source_column) |
<source_column> |
identity(ts) -> 'ts' |
表的列名称不能够与分区键的名称冲突,否则会报错、建表失败。
fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), c3 int comment 'this is c3 field', year(ts), ts_year bigint ) store as iceberg; ERROR: column "ts_year" specified more than once
Iceberg不允许时间类型的同一列同时出现在year、month、day、hour多个表达式中,但是允许该列出现在truncate、bucket其他表达式中。
fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), c3 int comment 'this is c3 field', year(ts), month(ts) ) store as iceberg; ERROR: java.lang.IllegalArgumentException: Cannot add redundant partition: 1002: ts_year: year(4) conflicts with 1003: ts_month: month(4). fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), c3 int comment 'this is c3 field', year(ts), bucket(7, ts) ) store as iceberg; CREATE TABLE
FabricSQL支持创建分区键的最大数量是4,否则会报错、建表失败。
fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), truncate(10, id), c3 int comment 'this is c3 field', year(ts) ) store as iceberg; ERROR: partitioned table should have no more than 4 partition keys
FabricSQL不支持将复杂类型作为分区键,否则会报错、建表失败。
fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), truncate(10, id), c3 struct<c: text, d: int>, year(ts) ) store as iceberg;
ERROR: Complex type cannot be served as a value-partitioning column.
fabricsql=> CREATE TABLE iceberg_table(a int , b struct<c: text, d: int>) PARTITION BY(bucket(5, b.c), truncate(3, b.d)) store as ICEBERG;
ERROR: syntax error at or near "."
LINE 1: ... struct<c: text, d: int>) PARTITION BY(bucket(5, b.c), trunc...
^
FabricSQL支持创建表达式分区的Iceberg外表,需要满足额外条件:
- 与OBS上Iceberg表的分区键数量相同。
- 与OBS上Iceberg表的分区键定义完全一致,包括分区键的顺序、分区键的表达式、分区键操作的表列。
后续小节将以iceberg_test_part_expr表为样例,其表定义如下:
fabricsql=> set current_schema to test; fabricsql=> CREATE TABLE iceberg_test_part_expr(id bigint, data text, category text, ts date) partition by (c3 int, truncate(4, data), year(ts)) store as iceberg; CREATE TABLE
导入数据
实现表达式分区的Iceberg表支持数据的单条插入和批量导入功能。
fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'hellohello', 'shanxi', '2025-08-29', 1); INSERT 1 fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'hellohello', 'shanxi', '2025-08-29', 1); INSERT 1 fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'he', 'shanxi', '2024-09-29', 20); INSERT 1 fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'helloworld', 'shanxi', '2023-06-29', 6); INSERT 1 fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'hella', 'shanxi', '2022-05-29', 60); INSERT 1 fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'zhangmingmin', 'shanxi', '2025-04-29', 100); INSERT 1
对于INSERT INTO PARTITION功能,FabricSQL仅支持所有分区键为identity类型的表,并不支持其他类型表达式的分区表。
fabricsql=> INSERT INTO iceberg_test_part_expr PARTITION(c3=1, data='hell', ts='2025') values (9, 'hellohello', 'shanxi', '2025-08-29 01:05:59.550463') ; ERROR: INSERT INTO PARTITION clause only support identity partition.
可以使用函数pg_get_partitions()来查看表的所有分区数据,示例如下:
fabricsql=> select partition_name from pg_get_partitions('iceberg_test_part_expr');
partition_name
--------------------------
c3=1/data=hell/ts=2025
c3=20/data=he/ts=2024
c3=100/data=zhan/ts=2025
c3=6/data=hell/ts=2023
c3=60/data=hell/ts=2022
(5 rows)
基于列表达式的查询优化
在支持表达式分区之后,FabricSQL可下推列表达式,从而进行静态分区剪枝的查询优化。 示例如下:
fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where c3 = 1;
QUERY PLAN
------------------------------------------------------------------------------------------
id | operation | A-rows
----+--------------------------------------------------------------------------+--------
1 | -> Row Adapter | 2
2 | -> Partitioned Vector Foreign Scan on iceberg_test_part_expr | 2
3 | -> Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr | 2
Predicate Information (identified by plan id)
--------------------------------------------------------------------
2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
Filter: (c3 = 1)
Pushdown Predicate Filter: (c3 = 1)
Server Type: lf
DN read from: direct
3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
Filter: (c3 = 1)
Pushdown Predicate Filter: (c3 = 1)
Server Type: lf
Pruning results: (Manifests total: 6, Manifests left: 2)
DN read from: direct
(19 rows)
可以看到, c3=1列表达式下推后,扫描的分区数量由6个裁减为2个。
基于SQL函数的查询优化
在实现表达式分区之后,FabricSQL支持白名单SQL函数的下推优化。当前主要支持的SQL函数白名单如下:
|
分区表达式 |
SQL函数 |
下推条件 |
|---|---|---|
|
year |
year(), date_trunc() |
date_trunc()的时间单位是year。 |
|
month |
date_trunc() |
date_trunc()的时间单位是month。 |
|
day |
date_trunc() |
date_trunc()的时间单位是day。 |
|
hour |
date_trunc() |
date_trunc()的时间单位是hour。 |
|
truncate |
left(), substring() |
substring()必须从字符串头开始算起。 |
字符串函数left()和substring()下推进行静态分区剪枝的示例如下:
fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where substring(data, 1, 3) > 'hello';
QUERY PLAN
------------------------------------------------------------------------------------------
id | operation | A-rows
----+--------------------------------------------------------------------------+--------
1 | -> Row Adapter | 1
2 | -> Partitioned Vector Foreign Scan on iceberg_test_part_expr | 1
3 | -> Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr | 1
Predicate Information (identified by plan id)
--------------------------------------------------------------------
2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
Filter: ("substring"(data, 1, 3) > 'hello'::text)
Server Type: lf
DN read from: direct
3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
Server Type: lf
Pruning results: (Manifests total: 6, Manifests left: 1)
DN read from: direct
(16 rows)
fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where left(data, 5) < 'hello';
QUERY PLAN
------------------------------------------------------------------------------------------
id | operation | A-rows
----+--------------------------------------------------------------------------+--------
1 | -> Row Adapter | 2
2 | -> Partitioned Vector Foreign Scan on iceberg_test_part_expr | 2
3 | -> Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr | 5
Predicate Information (identified by plan id)
--------------------------------------------------------------------
2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
Filter: ("left"(data, 5) < 'hello'::text)
Server Type: lf
DN read from: direct
3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
Server Type: lf
Pruning results: (Manifests total: 6, Manifests left: 5)
DN read from: direct
(16 rows)
可以看到,substring()函数下推后,分区数量由6个裁减到1个。 而left()函数下推后,分区数量由6个裁减到5个。
时间类型函数date_trunc()下推进行静态分区剪枝的示例如下:
fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where date_trunc('year', ts) > '2023-12-30 00:00:00';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
id | operation | A-rows
----+--------------------------------------------------------------------------+--------
1 | -> Row Adapter | 4
2 | -> Partitioned Vector Foreign Scan on iceberg_test_part_expr | 4
3 | -> Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr | 4
Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------
2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
Filter: (date_trunc('year'::text, ts) > '2023-12-30 00:00:00'::timestamp without time zone)
Server Type: lf
DN read from: direct
3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
Server Type: lf
Pruning results: (Manifests total: 6, Manifests left: 4)
DN read from: direct
(16 rows)
fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where year(ts) < 2024;
QUERY PLAN
------------------------------------------------------------------------------------------
id | operation | A-rows
----+--------------------------------------------------------------------------+--------
1 | -> Row Adapter | 2
2 | -> Partitioned Vector Foreign Scan on iceberg_test_part_expr | 2
3 | -> Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr | 2
Predicate Information (identified by plan id)
--------------------------------------------------------------------
2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
Filter: (year(ts) < 2024)
Server Type: lf
DN read from: direct
3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
Server Type: lf
Pruning results: (Manifests total: 6, Manifests left: 2)
DN read from: direct
(16 rows)
可以看到, date_trunc()函数下推后,分区数量由6个裁减到4个。 而year()函数下推后,分区数量由6个裁减到2个。