Why Are Some Functions Not Available when ThriftJDBCServers Are Connected?

Question

Scenario 1:

I set up permanent functions through the add jar. When another ThriftJDBCServer is connected or restarted, the add jar needs to be restarted.

Figure 1 Error information in Scenario 1

Scenario 2:

show functions can be used to query functions, but cannot be used. The reason is that connected JDBC node does not contain jar packages of the corresponding path. After adding corresponding jar packages, the show functions can be properly used.

Figure 2 Error information in scenario 2

Answer

Scenario 1:

The addjar statement loads the jar only to the jarClassLoader of the currently connected JDBCServer. Different JDBCServers do not share the jarClassLoader. After JDBCServer restarts, new jarClassLoader is created. So the addjar statement needs to be run again.

You can add a JAR file in either of the following ways: Add a JAR file when starting spark-sql, for example, spark-sql --jars /opt/test/two_udfs.jar. Add a JAR file after spark-sql is started, for example, add jar /opt/test/two_udfs.jar. The path specified by add jar can be a local path or an HDFS path.

Scenario 2:

The show functions command obtains all functions in the current database from the external catalog. When a function is used in SQL statements, JDBCServer loads the JAR package corresponding to the function.

If the JAR file does not exist, the function cannot be used. In this case, run the add jar command again.

Parent topic: Spark SQL and DataFrame

Previous topic: Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?

Next topic: Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed?