Updated on 2025-09-18 GMT+08:00

Complex Type

DataArts Fabric SQL can read complex columns from ORC/parquet files. Currently, only reading is supported, and only the struct and array types are supported. For the array type, multidimensional array queries are not yet supported.

Complex Type DDL

Currently, DDL only supports STRUCT and ARRAY types.

The definition syntax is as follows:

  • ARRAY: Defined with the syntax ARRAY<data_type>.
  • STRUCT: Defined with the syntax STRUCT<col_name : data_type, ...>.

Here, data_type indicates the data type, which can be either a basic type or a complex type. For details about basic types, refer to Mapping Between DataArts Fabric SQL and LakeFormation Data Types.

DDL example:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Create an external table containing complex types.
CREATE EXTERNAL TABLE test_table (
    name TEXT,
    address STRUCT<zip:INT, city:TEXT>,
    tags ARRAY<TEXT>,
    logs ARRAY<STRUCT<term:TEXT, time:TIMESTAMP>>
) store as orc location 'obs://test/test1';

-- After the table is created, you can view the full definition of LakeFormation.
DESCRIBE test_table;

Complex Type DQL

  • The Struct type allows for direct retrieval of internal elements through point queries, where the table column names should be enclosed in parentheses (()).
  • Arrays function similarly to regular arrays, with specific elements accessed via indices. Note that indexing begins at 1.

DQL example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Query complex data.
SELECT * FROM test_table;
  name   |       address       |       tags         |                               logs                                 
---------+---------------------+--------------------+--------------------------------------------------------------------- 
 Alice   | (10001,New York)    | {friend,colleague} | {"(login,2024-01-01 00:00:00)","(login,2024-02-01 00:00:00)"}
 Bob     | (90001,Los Angeles) | {family}           | {"(purchase,2024-02-15 00:00:00)","(purchase,2024-02-10 00:00:00)"}
 Charlie | (60601,Chicago)     | {friend,hobby}     | {NULL}
(3 rows)

-- Extract struct elements.
SELECT (address).city, (address).zip FROM test_table;
  city   |    zip   
---------+----------
  10001  | New York 
  90001  | Angeles  
  60601  | Chicago  
(3 rows)

-- Extract array elements.
SELECT tags[1], tags[2] FROM test_table;
  tags   |   tags   
---------+----------
  friend | colleague 
  family |   
  friend | hobby  
(3 rows)

Conversion of Complex Types to JSONB

Current complex types including STRUCT and ARRAY are read through JSON. You need to explicitly convert the data to the appropriate data type after reading.

DQL examples (for detailed operator and function descriptions of jsonb, refer to JSON/JSONB Functions and Operators.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- -> Extracts elements of a JSON object or array by key or index.
SELECT name, address, tags, logs->0->'term' FROM test_table;
  name   |                address                |          tags           |  ?column?  
---------+---------------------------------------+-------------------------+------------
 Alice   | {"zip": 10001, "city": "New York"}    | ["friend", "colleague"] | "login"
 Bob     | {"zip": 90001, "city": "Los Angeles"} | ["family"]              | "purchase"
 Charlie | {"zip": 60601, "city": "Chicago"}     | ["friend", "hobby"]     | 
(3 rows)

-- ->> Extracts element and converts it to text.
SELECT name, address FROM test_table WHERE address->>'zip' = '10001';
 name  |              address               
-------+------------------------------------
 Alice | {"zip": 10001, "city": "New York"}
(1 row)

-- @> Array contains elements, where the left array includes all elements of the right array.
SELECT name, tags FROM test_table WHERE tags @> '["friend"]'::jsonb;
  name   |          tags           
---------+-------------------------
 Alice   | ["friend", "colleague"]
 Charlie | ["friend", "hobby"]
(2 rows)

-- Display type conversion. Use ->> to get as character and then perform type conversion.
SELECT (logs->0->>'time')::TIMESTAMP FROM test_table;
      timestamp      
---------------------
 2024-01-01 00:00:00
 2024-02-15 00:00:00

(3 rows)