Help Center > > Developer Guide> PL/Java User-defined Functions

PL/Java User-defined Functions

Updated at: Jul 15, 2020 GMT+08:00

With the DWS PL/Java functions, you can choose your favorite Java IDE to write Java methods and install the JAR files containing these methods into the DWS database before invoking them. DWS PL/Java is developed based on open-source Greenplum PL/Java 1.4.0. PL/Java uses JDK 1.8.0_201.

Constraints

Java UDF can be used for some Java logical computing. You are not advised to encapsulate services in Java UDF.

  • You are not advised to connect to a database in any way (for example, JDBC) in Java functions.
  • Currently, only data types listed in Table 1 are supported. Other data types, such as user-defined data types and complex data types (for example, Java array and its derived types) are not supported.
  • Currently, UDAF and UDTF are not supported.

Examples

Before using PL/Java, you need to pack the implementation of Java methods into a JAR package and deploy it into the database. Then, create functions as a database administrator. For compatibility purposes, use JDK 1.8.0_201 for compilation.

  1. Compile a JAR package.

    Java method implementation and JAR package archiving can be achieved in an integrated development environment (IDE). The following is a simple example of compilation and archiving through command lines. You can create a JAR package that contains a single method in the similar way.

    First, prepare an Example.java file that contains a method for converting substrings to uppercase. In the following example, Example is the class name and upperString is the method name:

    1
    2
    3
    4
    5
    6
    7
    public class Example 
    {
        public static String upperString (String text, int beginIndex, int endIndex) 
        {
            return text.substring(beginIndex, endIndex).toUpperCase();
        }
    }
    

    Then, create a manifest.txt file containing the following content:

    1
    2
    3
    4
    5
    6
    Manifest-Version: 1.0
    Main-Class: Example
    Specification-Title: "Example"
    Specification-Version: "1.0"
    Created-By: 1.6.0_35-b10-428-11M3811
    Build-Date: 08/14/2018 10:09 AM
    

    Manifest-Version specifies the version of the manifest file. Main-Class specifies the main class used by the .jar file. Specification-Title and Specification-Version are the extended attributes of the package. Specification-Title specifies the title of the extended specification and Specification-Version specifies the version of the extended specification. Created-By specifies the person who created the file. Build-Date specifies the date when the file was created.

    Finally, archive the .java file and package it into javaudf-example.jar.

    1
    2
    javac Example.java
    jar cfm javaudf-example.jar manifest.txt Example.class
    

    JAR package names must comply with JDK rules. If a name contains invalid characters, an error occurs when a function is deployed or used.

  2. Deploy the JAR package.

    First store the JAR package on an OBS server. For details, see Uploading an Object in the Object Storage Service Console Operation Guide. Then, create the access key AK/SK. To obtain access keys, log in to the management console, move the cursor over the username in the upper right corner, and click My Credentials. Then choose Access Keys in the navigation tree on the left. On the Access Keys page, you can view the existing AKs or click Create Access Key to create an AK/SK pair. After that, log in to the database, run the gs_extend_library function, and import the package to DWS.

    1
    SELECT gs_extend_library('addjar', 'obs://bucket/path/javaudf-example.jar accesskey=access_key_value_to_be_replaced  secretkey=secret_access_key_value_to_be_replaced  region=region_name libraryname=example');
    

    For details about how to use the gs_extend_library function, see Manage JAR packages and files. Change the values of AK and SK as needed. Replace region_name with an actual region name.

  3. Use a PL/Java function.

    Log in to the database as a user who has the sysadmin permission (for example, dbadmin) and create the java_upperstring function:

    1
    2
    3
    4
    CREATE FUNCTION java_upperstring(VARCHAR, INTEGER, INTEGER)
        RETURNS VARCHAR
        AS 'Example.upperString'
    LANGUAGE JAVA;
    
    • The data type defined in the java_upperstring function should be a type in DWS and match the data type defined in 1 in the upperString method in Java. For details about the mapping between DWS and Java data types, see Table 1.
    • The AS clause specifies the class name and static method name of the Java method invoked by the function. The format is Class name.Method name. The class name and method name must match the Java class and method defined in 1.
    • To use PL/Java functions, set LANGUAGE to JAVA.
    • For details about CREATE FUNCTION, see Create functions.

    Execute the java_upperstring function.

    1
    SELECT java_upperstring('test', 0, 1);
    

    The expected result is as follows:

    1
    2
    3
    4
     java_upperstring
    ---------------------
     T
    (1 row)
    

  4. Authorize a common user to use the PL/Java function.

    Create a common user whose name is udf_user and password is GAUSS@123.

    1
    CREATE USER udf_user PASSWORD 'GAUSS@123';
    

    This command grants user udf_user the permission for the java_upperstring function. Note that the user can use this function only if it also has the permission for using the schema of the function.

    1
    2
    GRANT ALL PRIVILEGES ON SCHEMA public TO udf_user;
    GRANT ALL PRIVILEGES ON FUNCTION java_upperstring(VARCHAR, INTEGER, INTEGER) TO udf_user;
    

    Log in to the database as user udf_user.

    1
    SET SESSION SESSION AUTHORIZATION udf_user PASSWORD 'GAUSS@123';
    

    Execute the java_upperstring function.

    1
    SELECT public.java_upperstring('test', 0, 1);
    

    The expected result is as follows:

    1
    2
    3
    4
     java_upperstring
    ---------------------
     T
    (1 row)
    

  5. Delete the function.

    If you no longer need this function, delete it.
    1
    DROP FUNCTION java_upperstring;
    

  6. Uninstall the JAR package.

    Use the gs_extend_library function to uninstall the JAR package.

    1
    SELECT gs_extend_library('rmjar', 'libraryname=example');
    

SQL Definition and Usage

  • Manage JAR packages and files.

    A database user having the sysadmin permission can use the gs_extend_library function to deploy, view, and delete JAR packages in the database. The syntax of the function is as follows:

    1
    SELECT gs_extend_library('[action]', '[operation]');
    
    • action: operation action. The options are as follows:
      • ls: Displays JAR packages in the database and checks the MD5 value consistency of files on each node.
      • addjar: deploys a JAR package on the OBS server in the database.
      • rmjar: Deletes JAR packages from the database.
    • operation: operation string. The format can be either of the following:

      obs://[bucket]/[source_filepath] accesskey=[accesskey] secretkey=[secretkey] region=[region] libraryname=[libraryname]

      • bucket: name of the bucket to which the obs file belongs. It is mandatory.
      • source_filepath: file path on the OBS server. Only .jar files are supported.
      • accesskey: key obtained for accessing the OBS service. It is mandatory.
      • secret_key: secret key obtained for the OBS service. It is mandatory.
      • region: region where the OBS bucket stored in the JAR package of a user-defined function belongs to. This parameter is mandatory.
      • libraryname: user-defined library name, which is used to invoke JAR files in DWS. If action is set to addjar or rmjar, libraryname must be specified. If action is set to ls, libraryname is optional. Note that a user-defined library name cannot contain the following characters: /|;&$<>\'{}"()[]~*?!
  • Create functions.

    PL/Java functions can be created using the CREATE FUNCTION syntax and are defined as LANGUAGE JAVA, including the RETURNS and AS clauses.

    • To use CREATE FUNCTION, specify the name and parameter type for the function to be created.
    • The RETURNS clause specifies the return type for the function.
    • The AS clause specifies the class name and static method name of the Java method to be invoked. If the NULL value needs to be transferred to the Java method as an input parameter, specify the name of the Java encapsulation class corresponding to the parameter type. For details, see NULL Handling.
    • For details about the syntax, see CREATE FUNCTION.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      CREATE [ OR REPLACE ] FUNCTION function_name
      ( [ { argname [ argmode ] argtype [ { DEFAULT | := | = } expression ]} [, ...] ])
      [ RETURNS rettype [ DETERMINISTIC ] ]
      LANGAUGE JAVA
      [
          { IMMUTABLE | STATBLE | VOLATILE }
          | [ NOT ] LEAKPROOF
          | WINDOW
          | { CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT |STRICT }
          | {[ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER | AUTHID DEFINER | AUTHID CURRENT_USER}
          | { FENCED }
          | COST execution_cost
          | ROWS result_rows
          | SET configuration_parameter { {TO |=} value | FROM CURRENT}
      ] [...]
      {
          AS 'class_name.method_name' ( { argtype } [, ...] )
      }
      
  • Use functions.

    During execution, PL/Java searches for the Java class specified by a function among all the deployed JAR packages, which are ranked by name in alphabetical order, invokes the Java method in the first found class, and returns results.

  • Delete functions.

    PL/Java functions can be deleted by using the DROP FUNCTION syntax. For details about the syntax, see DROP FUNCTION.

    DROP FUNCTION [ IF EXISTS ] function_name [ ( [ {[ argmode ] [ argname ] argtype} [, ...] ] ) [ CASCADE | RESTRICT ] ];

    To delete an overloaded function (for details, see Overloaded Functions), specify argtype in the function. To delete other functions, simply specify function_name.

  • Authorize permissions for functions.

    Only user sysadmin can create PL/Java functions. It can also grant other users the permission to use the PL/Java functions. For details about the syntax, see GRANT.

    GRANT { EXECUTE | ALL [ PRIVILEGES ] }
        ON { FUNCTION {function_name ( [ {[ argmode ] [ arg_name ] arg_type} [, ...] ] )} [, ...]
            | ALL FUNCTIONS IN SCHEMA schema_name [, ...] }
        TO { [ GROUP ] role_name | PUBLIC } [, ...]
        [ WITH GRANT OPTION ];

Mapping for Basic Data Types

Table 1 PL/Java mapping for default data types

DWS

Java

BOOLEAN

boolean

"char"

byte

bytea

byte[]

SMALLINT

short

INTEGER

int

BIGINT

long

FLOAT4

float

FLOAT8

double

CHAR

java.lang.String

VARCHAR

java.lang.String

TEXT

java.lang.String

name

java.lang.String

DATE

java.sql.Timestamp

TIME

java.sql.Time (stored value treated as local time)

TIMETZ

java.sql.Time

TIMESTAMP

java.sql.Timestamp

TIMESTAMPTZ

java.sql.Timestamp

Array Type Processing

DWS can convert basic array types. You only need to append a pair of square brackets ([]) to the data type when creating a function.

CREATE FUNCTION java_arrayLength(INTEGER[])
    RETURNS INTEGER
    AS 'Example.getArrayLength'
LANGUAGE JAVA;

Java code is similar to the following:

public class Example
{
    public static int getArrayLength(Integer[] intArray)
    {
        return intArray.length;
    }
}

Invoke the following statement:

SELECT java_arrayLength(ARRAY[1, 2, 3]);

The expected result is as follows:

java_arrayLength
---------------------
3
(1 row)

NULL Handling

NULL values cannot be handled for DWS data types that are mapped and can be converted to simple Java types by default. If you use a Java function to obtain and process the NULL value transferred from DWS, specify the Java encapsulation class in the AS clause as follows:

CREATE FUNCTION java_countnulls(INTEGER[])
    RETURNS INTEGER
    AS 'Example.countNulls(java.lang.Integer[])'
LANGUAGE JAVA;

Java code is similar to the following:

public class Example
{
    public static int countNulls(Integer[] intArray)
    {
        int nullCount = 0;
        for (int idx = 0; idx < intArray.length; ++idx)
        {
            if (intArray[idx] == null)
            nullCount++;
        }
        return nullCount;
    }
}

Invoke the following statement:

SELECT java_countNulls(ARRAY[null, 1, null, 2, null]);

The expected result is as follows:

java_countNulls
--------------------
3
(1 row)

Overloaded Functions

PL/Java supports overloaded functions. You can create functions with the same name or invoke overloaded functions from Java code. The procedure is as follows:

  1. Create overloaded functions.

    For example, create two Java methods with the same name, and specify the methods dummy(int) and dummy(String) with different parameter types.

    public class Example
    {
        public static int dummy(int value)
        {
            return value*2;
        }
        public static String dummy(String value)
        {
            return value;
        }
    }

    In addition, create two functions with the same names as the above two functions in DWS.

    CREATE FUNCTION java_dummy(INTEGER)
        RETURNS INTEGER
        AS 'Example.dummy'
    LANGUAGE JAVA;
    
    CREATE FUNCTION java_dummy(VARCHAR)
        RETURNS VARCHAR
        AS 'Example.dummy'
    LANGUAGE JAVA;

  2. Invoke the overloaded functions.

    DWS invokes the functions that match the specified parameter type. The results of invoking the above two functions are as follows:

    SELECT java_dummy(5);
     java_dummy
    -----------------
                10
    (1 row)
    
    SELECT java_dummy('5');
     java_dummy
    ---------------
    5
    (1 row)

    Note that DWS may implicitly convert data types. Therefore, you are advised to specify the parameter type when invoking an overloaded function.

    SELECT java_dummy(5::varchar);
     java_dummy
    ----------------
    5
    (1 row)

    In this case, the specified parameter type is preferentially used for matching. If there is no Java method matching the specified parameter type, the system implicitly converts the parameter and searches for Java methods based on the conversion result.

    SELECT java_dummy(5::INTEGER);
     java_dummy
    -----------------
    10
    (1 row)
    
    DROP FUNCTION java_dummy(INTEGER);
    
    SELECT java_dummy(5::INTEGER);
     java_dummy
    ----------------
    5
    (1 row)

    Data types supporting implicit conversion are as follows:

    • SMALLINT: It can be converted to the INTEGER type by default.
    • SMALLINT and INTEGER: They can be converted to the BIGINT type by default.
    • TINYINT, SMALLINT, INTEGER, and BIGINT: They can be converted to the BOOL type by default.
    • CHAR, NAME, BIGINT, INTEGER, SMALLINT, TINYINT, RAW, FLOAT4, FLOAT8, BPCHAR, VARCHAR, NVARCHAR2, DATE, TIMESTAMP, TIMESTAMPTZ, NUMERIC, and SMALLDATETIME: They can be converted to the TEXT type by default.
    • TEXT, CHAR, BIGINT, INTEGER, SMALLINT, TINYINT, RAW, FLOAT4, FLOAT8, BPCHAR, DATE, NVARCHAR2, TIMESTAMP, NUMERIC, and SMALLDATETIME: They can be converted to the VARCHAR type by default.

  3. Delete the overloaded functions.

    To delete an overloaded function, specify the parameter type for the function. Otherwise, the function cannot be deleted.

    DROP FUNCTION java_dummy(INTEGER);

GUC Parameters

  • pljava_vmoptions

    A session-level GUC parameter. It is used to set JVM startup parameters.

    SET pljava_vmoptions='-Xmx64m –Xms2m –XX:MaxMetaspaceSize=8m';

    pljava_vmoptions supports:

    • JDK8 JVM startup parameters.
    • JDK8 JVM system attributes (starting with –D, for example, –Djava.ext.dirs).

      You are not advised to set any parameters that contain directories because such setting may lead to unpredictable behavior.

    • User-defined parameters (starting with –D, for example, –Duser.defined.option).

      If a user sets pljava_vmoptions to a value beyond the value range, an error will be reported during function revoking.

      SET pljava_vmoptions=' illegal.option';
      SET
      SELECT java_dummy(5::int);
      ERROR:  UDF Error:cannot use PL/Java before successfully completing its setup.Please check if your pljava_vmoption is set correctly,since we do not ignore illegal parameters.Or check the log for more messages.
  • udf_memory_limit

    A system-level GUC parameter. It is used to limit the physical memory used by each CN or DN for executing UDFs. The default value is 200 MB. You can use the postgresql.conf file to modify the parameter setting. The modification takes effect only after the database is restarted.

    • udf_memory_limit is a part of max_process_memory. When a CN or DN is started, memory calculated by udf_memory_limit minus 200 MB will be reserved for Worker processes. CN and DN processes are different from Worker processes, and the former will save memory for the latter.

      For example, if max_process_memory is set to 10 GB on a DN and udf_memory_limit is set to 4 GB, the DN can use a maximum of 6.2 GB memory, that is, 10 GB – (4 GB – 200 MB). This case applies even if no UDF is executed. By default, the value of udf_memory_limit is 200 MB. Querying the pv_total_memory_detail view will prove that the value of process_used_memory would never exceed the calculation result of max_process_memory – (udf_memory_limit200 MB).

    • Executing a simplest Java UDF on a CN consumes about 50 MB physical memory. You can set this parameter based on the memory usage and concurrency of Java functions to be used. After this parameter is added, you are not advised to set UDFWorkerMemHardLimit and FencedUDFMemoryLimit.
    • If the parallelism of the UDF process is excessively high and the memory usage exceeds the udf_memory_limit value, unexpected situations such as process exit may occur. In this scenario, the execution result may be unreliable. You are advised to set this parameter to reserve sufficient memory based on the site requirements. If the memory is severely insufficient, the UDF master process may exit. You can view the UDF master log for analysis. The default UDF master log path is $GAUSSLOG/cm/cm_agent/pg_log. For example, if the following log is displayed, the memory resources are insufficient and the UDF master process exits. In this case, you need to check the udf_memory_limit parameter.

      0 [BACKEND] FATAL: poll() failed: Bad address, please check the parameter:udf_memory_limit to make sure there is enough memory.

  • FencedUDFMemoryLimit

    A session-level GUC parameter. It is used to specify the maximum virtual memory used by a single Fenced UDF Worker process initiated by a session.

    SET FencedUDFMemoryLimit='512MB';

    The value range of this parameter is (150 MB, 1 G]. If the value is greater than 1 G, an error will be reported immediately. If the value is less than or equals to 150 MB, an error will be reported during function invoking.

    • If FencedUDFMemoryLimit is set to 0, the virtual memory for a Fenced UDF Worker process will not be limited.
    • You are advised to use udf_memory_limit to control the physical memory used by Fenced UDF Worker processes. You are not advised to use FencedUDFMemoryLimit, especially when Java UDFs are used. If you are clear about the impact of this parameter, set it based on the following information:
      • After a C Fenced UDF Worker process is started, it will occupy about 200 MB virtual memory, and about 16 MB physical memory.
      • After a Java Fenced UDF Worker process is started, it will occupy about 2.5 GB virtual memory, and about 50 MB physical memory.

Exception Handling

If there is an exception in a JVM, PL/Java will export JVM stack information during the exception to a client.

Logging

PL/Java uses the standard Java Logger. Therefore, you can record logs as follows:

Logger.getAnonymousLogger().config( "Time is " + new 
Date(System.currentTimeMillis()));

An initialized Java Logger class is set to the CONFIG level by default, corresponding to the LOG level in DWS. In this case, log messages generated by Java Logger are all redirected to the DWS backend. Then, the log messages are written into server logs or displayed on the user interface. MPPDB server logs record information at the LOG, WARNING, and ERROR levels. The SQL user interface displays logs at the WARNING and ERROR levels. The following table lists mapping between Java Logger levels and DWS log levels.

Table 2 PL/Java log levels

java.util.logging.Level

DWS Log Level

SERVER

ERROR

WARINING

WARNING

CONFIG

LOG

INFO

INFO

FINE

DEBUG1

FINER

DEBUG2

FINEST

DEBUG3

You can change Java Logger levels. For example, if the Java Logger level is changed to SEVERE by the following Java code, log messages (msg) will not be recorded in DWS logs during WARNING logging.

Logger log = Logger.getAnonymousLogger();
Log.setLevel(Level.SEVERE);
log.log(Level.WARNING, msg);

Security Issues

In DWS, PL/Java is an untrusted language. Only user sysdamin can create PL/Java functions. The user can grant other users the permission to use the PL/Java functions. For details, see Authorize permissions for functions.

In addition, PL/Java controls user access to file systems, forbidding users from reading most system files, or writing, deleting, or executing any system files in Java methods.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel