Updated on 2022-11-18 GMT+08:00

CREATE FUNCTION

Syntax

CREATE FUNCTION qualified_function_name (

parameter_name parameter_type

[, ...]

)

RETURNS return_type

[ COMMENT function_description ]

[ LANGUAGE [ JAVA | JDBC ] ]

[ SPECIFIC specificName ]

[ DETERMINISTIC | NOT DETERMINISTIC ]

[ RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT ]

[ SYMBOL class_name ]

[ URI hdfs_path_to_jar ]

Description

It is used to create a function based on a given definition.

  • Each function is uniquely identified by a qualified name and parameter type list. The value format of qualified_function_name must be catalog.schema.function_name. You can plan and manage the function namespace (its format is catalog.schema). It is irrelevant to the concepts of catalog and schema in HetuEngine. The value of parameter_type must be a data type supported by HetuEngine.
  • The value of return_type must be a data type supported by HetuEngine and must match the actual type returned by the function. Forcible type conversion is not executed.
  • You can specify a group of features to decorate a function and specify its behavior. Each feature can be specified only once. For details, see Table 1.
    Table 1 Feature description

    Feature

    Default Value

    Description

    Language clause

    JDBC

    Language used to define a function. Currently, Java and JDBC are supported.
    • Java function: A JAR file for implementing a function needs to be provided, and the JAR file needs to be placed in HDFS that can be read by HetuEngine.
    • JDBC function: maps the functions of the JDBC data source in HetuEngine. Currently, only Scalar UDFs of the JDBC data source can be mapped (that is, a value is returned for each row of the result set).

    Specific name

    -

    Used by a JDBC function to specify a qualified function name in the JDBC data source.

    Deterministic characteristic

    NOT DETERMINISTIC

    Whether a function is deterministic.

    • DETERMINISTIC: A function is considered deterministic if it always returns the same result set when called with the same input set.
    • NOT DETERMINISTIC: A function is considered nondeterministic if it does not return the same result set when called with the same input set.

    Null-call clause

    CALLED ON NULL INPUT

    Function behavior.

    • RETURNS NULL ON NULL INPUT: When NULL is used as a function parameter, NULL is returned.
    • CALLED ON NULL INPUT: When NULL is used as a function parameter, the function is called.

    Symbol class_name

    -

    Used by a Java function to specify a fully qualified class name for function implementation.

    Uri hdfs_path_to_jar

    -

    Used by a Java function to specify a JAR file path for function implementation.

Remarks

  • JDBC functions do not support cross-domain data sources.
  • Permissions are controlled only based on user groups. For details, see Table 2.
    Table 2 Permission control

    Operation

    Permission Control

    CREATE

    No permission control

    DROP

    Only the owner has the permission to perform the operation.

    SELECT

    No permission control

    SHOW

    No permission control

Example

  • Create a Java function example.default.add_two. (You need to build and deploy the UDF first.)
    CREATE FUNCTION example.default.add_two (
     num integer
    )
    RETURNS integer
    LANGUAGE JAVA
    DETERMINISTIC
    SYMBOL "com.example.functions.AddTwo"
    URI "hdfs://hacluster/udfs/function-1.0.jar";
    
    -- Execute the function.
    select hetu.default.add_two(2);
  • Create a JDBC function example.namespace02.repeat:
    CREATE FUNCTION example.namespace02.repeat (
        str VARCHAR,
        times BIGINT
    )
    RETURNS VARCHAR
    LANGUAGE JDBC
    SPECIFIC repeat
    DETERMINISTIC;
    
    -- Execute the function.
    select example.namespace02.repeat(t1.name,3) from mppdb.test.table1 t1;

Precautions

  • JDBC functions must be pushed down to the data source for execution. By default, only the function name in the definition is retained in the SQL statements during pushdown. The data source may have the following three types of functions. Therefore, the pushdown may have the following differences:
    • The UDF function catalog1.schema1.func1 cannot be used because the SQL statements to be pushed down contain only func1 and do not contain catalog1.schema1.
    • The UDF function schema1.func1 cannot be used because the SQL statements to be pushed down contain only func1 and do not contain schema1.
    • The BUILTIN function func1 can be used because the SQL statements to be pushed down contain func1.

    Therefore, you can specify a qualified function name in the data source using the SPECIFIC parameter when creating a function in HetuEngine. If the SPECIFIC parameter is specified, its value will be used as the function description in the SQL statements to be pushed down by HetuEngine. Otherwise, the defined function name is used as the function description by default.

    For example, create functions:
    CREATE FUNCTION example.namespace01.func1(x INTEGER) RETURNS INTEGER LANGUAGE JDBC SPECIFIC catalog1.schema1.fx1 DETERMINISTIC;
    
    CREATE FUNCTION example.namespace01.func2(x INTEGER) RETURNS INTEGER LANGUAGE JDBC SPECIFIC schema1.fx2 DETERMINISTIC;
    
    CREATE FUNCTION example.namespace01.func3(x INTEGER) RETURNS INTEGER LANGUAGE JDBC DETERMINISTIC;
    HetuEngine pushes down SQL statements:
    ... catalog1.schema1.fx1(...) ...
    ... schema1.fx2(...) ...
    ... func3(...) ...
  • If you do not specify the SPECIFIC parameter when running CREATE FUNCTION, the function name must be the same as the function name in the data source. Otherwise, the pushdown executed by HetuEngine will fail because the function cannot be found in the data source.
  • JDBC functions can be used in Projections, Filters, Joins, and Constants. Currently, only JDBC functions can be used on the corresponding data source objects through Projects or Filters.

    Use the interconnection with the mppdb data source as an example:

    CREATE FUNCTION mppdb.namespace1.func_hello(x integer) RETURNS char(50) LANGUAGE JDBC DETERMINISTIC SPECIFIC fundb.hello;
    CREATE FUNCTION mppdb.namespace1.func_sum(x integer, y double) RETURNS double LANGUAGE JDBC DETERMINISTIC SPECIFIC fundb.sum;
    
    SHOW EXTERNAL FUNCTION mppdb.namespace1.func_hello;
    SHOW EXTERNAL FUNCTION mppdb.namespace1.func_sum;
    
    DROP FUNCTION IF EXISTS mppdb.namespace1.func_hello;
    DROP FUNCTION IF EXISTS mppdb.namespace1.func_sum;
    
    -----------------------
    --Function in Project--
    -----------------------
    SELECT mppdb.namespace1.func_hello(t1.id), t1.name FROM mppdb.testmppdb.test1 t1;
    SELECT mppdb.namespace1.func_hello(t1.id), t1.name FROM mppdb.testmppdb.test1 t1 LIMIT 3;
    SELECT COUNT(mppdb.namespace1.func_hello(t1.id)) FROM mppdb.testmppdb.test1 t1;
    SELECT mppdb.namespace1.func_sum(t1.cint, t1.cdouble), t1.column_timestamp FROM mppdb.testmppdb.test2 t1 WHERE t1.column_int < 3000;
    -- Not Supported
    SELECT mppdb.namespace1.func_hello(t1.id) AS hello_str, t2.cc_class, t2.cc_city FROM mppdb.testmppdb.test1 t1 LEFT JOIN tpcds.sf1.call_center t2 ON t1.name != t2.cc_name;
    -- Supported
    SELECT t1.hello_id, t2.cc_class, t2.cc_city FROM (SELECT mppdb.namespace1.func_hello(id) AS hello_id, name FROM mppdb.testmppdb.test1) t1 LEFT JOIN tpcds.sf1.call_center t2 ON t1.name != t2.cc_name;
    -- Not Supported
    SELECT mppdb.namespace1.func_hello(t1.id) AS hello_str, mppdb.namespace1.func_sum(t2.cint, t2.cdouble) FROM mppdb.testmppdb.test1 t1 LEFT JOIN mppdb.testmppdb.test2 t2 ON t1.name != t2.cchar;
    -- Supported
    SELECT t1.hello_str, t2.sum_t2 FROM (SELECT mppdb.namespace1.func_hello(id) AS hello_str, name FROM mppdb.testmppdb.test1) t1 LEFT JOIN (SELECT mppdb.namespace1.func_sum(cint, cdouble) AS sum_t2, cchar FROM mppdb.testmppdb.test2) t2 ON t1.name != t2.cchar;
    
    -----------------------
    --Function in Filter---
    -----------------------
    SELECT * FROM mppdb.testmppdb.test2 t1 WHERE mppdb.namespace1.func_sum(t1.cint, t1.cdouble) < 8000 AND t1.column_int < 8000;
    -- Not Supported
    SELECT t1.name, t2.* FROM mppdb.testmppdb.test1 t1 LEFT JOIN mppdb.testmppdb.test2 t2 ON t1.id < t2.cint WHERE mppdb.namespace1.func_sum(t2.cint, t2.cdouble) < 8000;
    -- Supported
    SELECT t1.name, t2.* FROM mppdb.testmppdb.test1 t1 LEFT JOIN (SELECT * FROM mppdb.testmppdb.test2 WHERE mppdb.namespace1.func_sum(cint, cdouble) < 8000) t2 ON t1.id < t2.cint;