Updated on 2025-08-25 GMT+08:00

Overview

Context

In SQL, each piece of data is associated with a data type that determines its behavior and usage. DataArts Fabric SQL provides an extensible data type system that is more versatile and flexible than other SQL implementations. Therefore, most type conversions in DataArts Fabric SQL are managed by common rules, which allow the use of mixed-type expressions.

The DataArts Fabric SQL scanner/analyzer only decomposes lexical elements into five basic categories: integers, floating point numbers, strings, identifiers, and keywords. Most non-numeric types are first represented as strings. The SQL language definition allows constant strings to be declared as specific types. For example, the query is as follows:

1
2
3
4
5
postgres=#SELECT text 'Origin' AS "label", point '(0,0)' AS "value";
 label  | value
--------+-------
 Origin | (0,0)
(1 row)

In the example, there are two text constants of the text and point types. If no type is specified for a string literal, the literal is first defined as an unknown type.

In the DataArts Fabric SQL analyzer, there are four basic SQL structures that require independent type conversion rules:

  • Function invocation

    Most SQL type systems are built on a rich set of functions. A function call can have one or more arguments. SQL allows function overloading. Therefore, the function to be called cannot be directly found based on the function name. The parser must select the correct function based on the parameter type provided by the function.

  • Operator

    SQL allows prefix or suffix (unary) operators to be used on expressions, and allows binary operators (two parameters) to be used inside expressions. Like functions, operators can be overloaded. Therefore, the selection of operators depends on the parameter type.

  • Value storage

    The INSERT and UPDATE statements store the results of expressions into a table. The expression type in the statement must be the same as the target field type or can be converted to the target field type.

  • UNION, CASE, and related constructs

    Because all query results in a union SELECT statement must be displayed in a column, the element types in each SELECT clause must match each other and be converted to a uniform type. Similarly, the result expression of a CASE construct must be converted to a unified type so that the entire case expression has a unified output type. The same requirement exists in the ARRAY construct and in the GREATEST and LEAST functions.

The system catalog pg_cast stores information about conversions between data types and how to perform these conversions. For details, see PG_CAST.

In the semantic analysis phase, the return value type of an expression is determined and an appropriate conversion behavior is selected. Basic data types, including Boolean, numeric, string, bitstring, datetime, timespan, geometric, and network. Each type has one or more preferred types to solve the problem of type selection. Depending on the preferred type and available implicit transformations, it is possible to ensure that ambiguous expressions (those with multiple candidate parsing solutions) can be resolved in an efficient way.

All type conversion rules are based on the following principles:

  • Implicit conversions should never have strange or unpredictable output.
  • If a query does not need implicit type conversion, the analyzer and executor should not perform more extra operations. That is, any query that matches the type and is well-formed should not consume more time in the parser and should not introduce any unnecessary implicit type conversion calls to the query.
  • Additionally, if a query requires implicit conversion when calling a function, the interpreter should use the new function after the user defines a function with correct parameters.