文档首页/ 数智融合计算服务 DataArts Fabric/ 开发指南/ 查询数据/ SQL on Iceberg/ 表达式分区

更新时间：2025-12-17 GMT+08:00

查看PDF

表达式分区

在处理大规模数据集时，数据分区是提高查询性能的关键技术之一。然而，传统的分区方式需要用户手动定义分区逻辑，这不仅增加了用户的负担，还可能导致分区策略不够灵活。表达式分区用于支持Iceberg的隐藏分区功能，其核心设计是将“分区逻辑”从用户视角隐藏，交给Iceberg元数据自动管理。如何确保数据分区既高效又灵活？通过使用表达式分区，用户可以无需关心复杂的分区逻辑，Iceberg将自动根据数据特征和查询模式优化分区策略，从而提高查询性能和数据管理效率。

约束限制

最多支持4个分区列。
不支持void表达式。
所有分区表达式不支持复杂数据类型。
truncate/bucket表达式不支持二进制数据。
INSERT INTO PARTITION功能仅支持identity表达式，不支持其他表达式。
Iceberg约束：同一个表列，不能出现在year/month/day/hour两个时间表达式中。但是，可以同时出现在时间和bucket/truncate表达式中。
Iceberg约束：对分区键有默认命名规则，不允许与表的列名称与之冲突。
当前不支持分区演进功能。

支持的分区表达式

Fabric SQL要支持分区表达式以及对应数据类型列表如下：

表1 Fabric SQL分区支持的表达式
表达式名称	支持的数据类型	计算结果的类型	说明
identity	smallint, int, bigint, long, numeric, date, timestamp, text, varchar, bpchar	原数据类型	不支持float，double，boolean, bytea。
bucket[N]	smallint, int, bigint, long, numeric， date, timestamp text, bpchar, varchar	int	先计算输出数值的hash值，再求模。计算结果为 [0, N)范围之内。哈希算法采用的是MurmurHash3算法。
truncate[W]	smallint, int, bigint, long,numeric， text, bpchar, varchar	原数据类型	不支持bytea类型。详情参见分区表达式：truncate。
year	date, timestamp	int	例：year('2025-10-10') 计算结果为 '2025'年。
month	date, timestamp	int	例：month('2025-10-10') 计算结果为'2025-10'月。
day	date, timestamp	int	例：day('2025-10-10') 计算结果为'2025-10-10'日。
hour	timestamp	int	例：hour('2025-10-10 10:10:10.000') 计算结果为'2025-10-10 10:00:00'时。
void	不支持	不支持	本质上是将所有数值映射到null值上。

所有表达式对于null值的处理结果必须返回null值。

identity表达式对输入的数值不做任何处理，直接输出原数值。

year表达式计算的是相对1970年、以年为单位的数值。month表达式计算的是相对1970-01-01、以月为单位的数值。day表达式计算的是相对1970-01-01、以天为单位的数值。hour表达式计算的是相对1970-01-01 00:00:00、以小时为单位的数值。Iceberg在记录分区值结果时，并不会直接记录整数数值，而是将它转换为可阅读的时间字符串。例如，Iceberg会将year('2025-10-10') 表达式计算出来的分区值记录为'2025'年，而不是整数55。

Fabric SQL提供了SQL函数fq_iceberg_year()、fq_iceberg_month()、fq_iceberg_day()、fq_iceberg_hour()、fq_iceberg_truncate()、fq_iceberg_bucket()用于实现对应分区表达式year、month、day、hour、truncate、bucket的计算功能。

分区表达式：truncate

对于不同的数据类型，truncate表达式的计算方法不同，可以使用函数fq_iceberg_truncate() 进行运算求值。

对于整数类型，truncate(W, integer_value)返回的是不超过integer_value、且为W倍数的最大数值。示例如下：

fabricsql=>select fq_iceberg_truncate (10, -2) ;
 fq_iceberg_truncate
---------------------
                 -10
(1 row)

fabricsql=> select fq_iceberg_truncate (10, 0) ;
 fq_iceberg_truncate
---------------------
                   0
(1 row)

fabricsql=> select fq_iceberg_truncate (10, 9) ;
 fq_iceberg_truncate
---------------------
                   0
(1 row)

fabricsql=> select fq_iceberg_truncate (10, 10) ;
 fq_iceberg_truncate
---------------------
                  10
(1 row)

fabricsql=> select fq_iceberg_truncate (10, 19) ;
 fq_iceberg_truncate
---------------------
                  10
(1 row)

对于字符串类型，truncate(W, string_value)返回的是string_value字符串中前缀长度为W的前缀子字符串。注意，长度以字符为单位。示例如下：

fabricsql=> select *, length(t)  from fq_iceberg_truncate (2, '') as t;
 iceberg_truncate | length
------------------+--------
                  |      0
(1 row)

fabricsql=> select *, length(t)  from fq_iceberg_truncate (2, 'a') as t;
 iceberg_truncate | length
------------------+--------
 a                |      1
(1 row)

fabricsql=> select *, length(t)  from fq_iceberg_truncate (2, 'tt') as t;
 iceberg_truncate | length
------------------+--------
 tt               |      2
(1 row)

fabricsql=> select *, length(t)  from fq_iceberg_truncate (2, 'cow') as t;
 iceberg_truncate | length
------------------+--------
 co               |      2
(1 row)

fabricsql=> select *, length(t)  from fq_iceberg_truncate (2, '你好中国!') as t;
 iceberg_truncate | length
------------------+--------
 你好             |      2
(1 row)

对于numeric类型，truncate(W, num_value)返回的是不超过num_value、且小数点右移scale位后为W倍数的最大数值。示例如下：

fabricsql=> select fq_iceberg_truncate (50, 10.64::numeric(15,3));
 fq_iceberg_truncate
---------------------
              10.600
(1 row)

fabricsql=> select fq_iceberg_truncate (50, 10.66::numeric(15,3));
 fq_iceberg_truncate
---------------------
              10.650
(1 row)

fabricsql=> select fq_iceberg_truncate (50, -10.66::numeric(15,3));
 fq_iceberg_truncate
---------------------
             -10.700
(1 row)

创建表达式分区表

CREATE TABLE语法详情参考CREATE TABLE。

创建一个表达式分区的Iceberg managed表SQL语句如下：

fabricsql=> set current_schema to test;
fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), truncate(10, id), c3 int comment 'this is c3 field', year(ts) ) store as iceberg;
CREATE TABLE
fabricsql=> select * from pg_get_partition_keys('test', 'iceberg_table');
     table     |  partition_key
---------------+-----------------
 iceberg_table | bucket(16,val)
 iceberg_table | truncate(10,id)
 iceberg_table | c3
 iceberg_table | year(ts)
(4 rows)

在示例中， c3是一个identity分区键。此外，还创建了bucket、truncate、year表达式的三个分区键。Iceberg会对分区键进行内部命名，其命名规则如下表所述。

表2 不同表达式分区键在Iceberg内部的默认命名规则
表达式	默认命名规则	样例
year(source_column)	<source_column>_year	year(ts) -> 'ts_year'
month(source_column)	<source_column>_month	month(ts) -> 'ts_month'
day(source_column)	<source_column>_day	day(ts) -> 'ts_day'
hour(source_column)	<source_column>_hour	hour(ts) -> 'ts_hour'
bucket(N, source_column)	<source_column>_bucket	bucket(ts) -> 'ts_bucket'
truncate(W, source_column)	<source_column>_trunc	truncate(ts) -> 'ts_trunc'
identity(source_column)	<source_column>	identity(ts) -> 'ts'

表的列名称不能够与分区键的名称冲突，否则会报错、建表失败。

fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val),  c3 int comment 'this is c3 field', year(ts), ts_year bigint ) store as iceberg;
ERROR:  column "ts_year" specified more than once

Iceberg不允许时间类型的同一列同时出现在year、month、day、hour多个表达式中，但是允许该列出现在truncate、bucket其他表达式中。

fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val),  c3 int comment 'this is c3 field', year(ts), month(ts) ) store as iceberg;
ERROR:  java.lang.IllegalArgumentException: Cannot add redundant partition: 1002: ts_year: year(4) conflicts with 1003: ts_month: month(4).

fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val),  c3 int comment 'this is c3 field', year(ts), bucket(7, ts) ) store as iceberg;
CREATE TABLE

Fabric SQL支持创建分区键的最大数量是4，否则会报错、建表失败。

fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), truncate(10, id), c3 int comment 'this is c3 field', year(ts) ) store as iceberg;
ERROR:  partitioned table should have no more than 4 partition keys

Fabric SQL不支持将复杂类型作为分区键，否则会报错、建表失败。

fabricsql=> CREATE TABLE iceberg_table(id bigint, data text, category text, ts timestamp, val numeric) partition by (bucket(16, val), truncate(10, id), c3 struct<c: text, d: int>, year(ts) ) store as iceberg;
ERROR:  Complex type cannot be served as a value-partitioning column.

fabricsql=> CREATE TABLE iceberg_table(a int , b struct<c: text, d: int>) PARTITION BY(bucket(5, b.c), truncate(3, b.d)) store as ICEBERG;
ERROR:  syntax error at or near "."
LINE 1: ... struct<c: text, d: int>) PARTITION BY(bucket(5, b.c), trunc...
                                                             ^

Fabric SQL支持创建表达式分区的Iceberg外表，需要满足额外条件：

与OBS上Iceberg表的分区键数量相同。
与OBS上Iceberg表的分区键定义完全一致，包括分区键的顺序、分区键的表达式、分区键操作的表列。

后续小节将以iceberg_test_part_expr表为样例，其表定义如下：

fabricsql=> set current_schema to test;
fabricsql=> CREATE TABLE iceberg_test_part_expr(id bigint, data text, category text, ts date) partition by (c3 int, truncate(4, data), year(ts)) store as iceberg;
CREATE TABLE

导入数据

实现表达式分区的Iceberg表支持数据的单条插入和批量导入功能。

fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'hellohello',   'shanxi', '2025-08-29', 1);
INSERT 1
fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'hellohello',   'shanxi', '2025-08-29', 1);
INSERT 1
fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'he',           'shanxi', '2024-09-29', 20);
INSERT 1
fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'helloworld',   'shanxi', '2023-06-29', 6);
INSERT 1
fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'hella',        'shanxi', '2022-05-29', 60);
INSERT 1
fabricsql=> INSERT INTO iceberg_test_part_expr values(9, 'zhangmingmin', 'shanxi', '2025-04-29', 100);
INSERT 1

对于INSERT INTO PARTITION功能，Fabric SQL仅支持所有分区键为identity类型的表，并不支持其他类型表达式的分区表。

fabricsql=> INSERT INTO iceberg_test_part_expr PARTITION(c3=1, data='hell', ts='2025') values (9, 'hellohello',   'shanxi', '2025-08-29 01:05:59.550463') ;
ERROR:  INSERT INTO PARTITION clause only support identity partition.

可以使用函数pg_get_partitions()来查看表的所有分区数据，示例如下：

fabricsql=> select partition_name from pg_get_partitions('iceberg_test_part_expr');
      partition_name
--------------------------
 c3=1/data=hell/ts=2025
 c3=20/data=he/ts=2024
 c3=100/data=zhan/ts=2025
 c3=6/data=hell/ts=2023
 c3=60/data=hell/ts=2022
(5 rows)

基于列表达式的查询优化

在支持表达式分区之后，Fabric SQL可下推列表达式，从而进行静态分区剪枝的查询优化。示例如下：

fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where c3 = 1;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
  id |                                operation                                 | A-rows
 ----+--------------------------------------------------------------------------+--------
   1 | ->  Row Adapter                                                          |      2
   2 |    ->  Partitioned Vector Foreign Scan on iceberg_test_part_expr         |      2
   3 |       ->  Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr |      2

            Predicate Information (identified by plan id)
 --------------------------------------------------------------------
   2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
         Filter: (c3 = 1)
         Pushdown Predicate Filter: (c3 = 1)
         Server Type: lf
         DN read from: direct
   3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
         Filter: (c3 = 1)
         Pushdown Predicate Filter: (c3 = 1)
         Server Type: lf
         Pruning results: (Manifests total: 6, Manifests left: 2)
         DN read from: direct
(19 rows)

可以看到， c3=1列表达式下推后，扫描的分区数量由6个裁减为2个。

基于SQL函数的查询优化

在实现表达式分区之后，Fabric SQL支持白名单SQL函数的下推优化。当前主要支持的SQL函数白名单如下：

表3 支持下推优化的SQL函数列表
分区表达式	SQL函数	下推条件
year	year(), date_trunc()	date_trunc()的时间单位是year。
month	date_trunc()	date_trunc()的时间单位是month。
day	date_trunc()	date_trunc()的时间单位是day。
hour	date_trunc()	date_trunc()的时间单位是hour。
truncate	left(), substring()	substring()必须从字符串头开始算起。

字符串函数left()和substring()下推进行静态分区剪枝的示例如下：

fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where substring(data, 1, 3) > 'hello';
                                        QUERY PLAN
------------------------------------------------------------------------------------------
  id |                                operation                                 | A-rows
 ----+--------------------------------------------------------------------------+--------
   1 | ->  Row Adapter                                                          |      1
   2 |    ->  Partitioned Vector Foreign Scan on iceberg_test_part_expr         |      1
   3 |       ->  Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr |      1

            Predicate Information (identified by plan id)
 --------------------------------------------------------------------
   2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
         Filter: ("substring"(data, 1, 3) > 'hello'::text)
         Server Type: lf
         DN read from: direct
   3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
         Server Type: lf
         Pruning results: (Manifests total: 6, Manifests left: 1)
         DN read from: direct
(16 rows)

fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where left(data, 5) < 'hello';
                                        QUERY PLAN
------------------------------------------------------------------------------------------
  id |                                operation                                 | A-rows
 ----+--------------------------------------------------------------------------+--------
   1 | ->  Row Adapter                                                          |      2
   2 |    ->  Partitioned Vector Foreign Scan on iceberg_test_part_expr         |      2
   3 |       ->  Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr |      5

            Predicate Information (identified by plan id)
 --------------------------------------------------------------------
   2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
         Filter: ("left"(data, 5) < 'hello'::text)
         Server Type: lf
         DN read from: direct
   3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
         Server Type: lf
         Pruning results: (Manifests total: 6, Manifests left: 5)
         DN read from: direct
(16 rows)

可以看到，substring()函数下推后，分区数量由6个裁减到1个。而left()函数下推后，分区数量由6个裁减到5个。

时间类型函数date_trunc()下推进行静态分区剪枝的示例如下：

fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where date_trunc('year', ts) > '2023-12-30 00:00:00';
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
  id |                                operation                                 | A-rows
 ----+--------------------------------------------------------------------------+--------
   1 | ->  Row Adapter                                                          |      4
   2 |    ->  Partitioned Vector Foreign Scan on iceberg_test_part_expr         |      4
   3 |       ->  Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr |      4

                            Predicate Information (identified by plan id)
 ---------------------------------------------------------------------------------------------------
   2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
         Filter: (date_trunc('year'::text, ts) > '2023-12-30 00:00:00'::timestamp without time zone)
         Server Type: lf
         DN read from: direct
   3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
         Server Type: lf
         Pruning results: (Manifests total: 6, Manifests left: 4)
         DN read from: direct
(16 rows)

fabricsql=> explain(analyze, predicate_only on, costs off, timing off) select * from iceberg_test_part_expr where year(ts) < 2024;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
  id |                                operation                                 | A-rows
 ----+--------------------------------------------------------------------------+--------
   1 | ->  Row Adapter                                                          |      2
   2 |    ->  Partitioned Vector Foreign Scan on iceberg_test_part_expr         |      2
   3 |       ->  Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr |      2

            Predicate Information (identified by plan id)
 --------------------------------------------------------------------
   2 --Partitioned Vector Foreign Scan on iceberg_test_part_expr
         Filter: (year(ts) < 2024)
         Server Type: lf
         DN read from: direct
   3 --Partitioned Vector Foreign Meta Scan on iceberg_test_part_expr
         Server Type: lf
         Pruning results: (Manifests total: 6, Manifests left: 2)
         DN read from: direct
(16 rows)

可以看到， date_trunc()函数下推后，分区数量由6个裁减到4个。而year()函数下推后，分区数量由6个裁减到2个。

父主题： SQL on Iceberg

上一篇：查询和写入Iceberg表

下一篇：访问第三方引擎创建的Iceberg表

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在我的云声建议查看反馈及问题处理状态。

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨

盘古Doer提问云社区提问