Why Functions Cannot Be Used When Different JDBCServers Are Connected?

Question

Scenario 1:

I run the add jar command to create permanent functions. When Beeline connects to different JDBCServers or the JDBCServer is restarted, I need to run the add jar command again.

Figure 1 Error information in Scenario 1
Click to enlarge

Scenario 2:

The functions can be queried by running the show functions command, but these functions cannot be used. This is because JAR files in the corresponding path do not exist on the connected JDBC node. After the JAR files are added, the query succeeds.

Figure 2 Error information in scenario 2
Click to enlarge

Answer

Scenario 1:

The addjar statement loads the jar only to the jarClassLoader of the currently connected JDBCServer. Different JDBCServers do not share the jarClassLoader. After JDBCServer restarts, new jarClassLoader is created. So the addjar statement needs to be run again.

You can add a JAR file in either of the following ways: Add a JAR file when starting Spark SQL, for example, by running spark-sql --jars /opt/test/two_udfs.jar. Add a JAR file after Spark SQL is started, for example, by running add jar /opt/test/two_udfs.jar. The path specified by add jar can be a local or an HDFS path.

Scenario 2:

The show functions command obtains all functions in the current database from the external catalog. When a function is used in SQL, JDBCServer loads the JAR file corresponding to the function.

If the JAR file does not exist, the function cannot be used. In this case, run the add jar command again.

Parent topic: Spark SQL and DataFrame

Previous topic: Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?

Next topic: Why Does an Exception Occur When I Drop Functions Created Using the Add Jar Statement?