Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using Spark2x/ Common Issues About Spark2x/ Spark SQL and DataFrame/ Why Functions Cannot Be Used When Different JDBCServers Are Connected?
Updated on 2022-12-09 GMT+08:00

Why Functions Cannot Be Used When Different JDBCServers Are Connected?

Question

Scenario 1:

I run the add jar command to create permanent functions. When Beeline connects to different JDBCServers or the JDBCServer is restarted, I need to run the add jar command again.

Figure 1 Error information in Scenario 1

Scenario 2:

The functions can be queried by running the show functions command, but these functions cannot be used. This is because JAR files in the corresponding path do not exist on the connected JDBC node. After the JAR files are added, the query succeeds.

Figure 2 Error information in scenario 2

Answer

Scenario 1:

The addjar statement loads the jar only to the jarClassLoader of the currently connected JDBCServer. Different JDBCServers do not share the jarClassLoader. After JDBCServer restarts, new jarClassLoader is created. So the addjar statement needs to be run again.

You can add a JAR file in either of the following ways: Add a JAR file when starting Spark SQL, for example, by running spark-sql --jars /opt/test/two_udfs.jar. Add a JAR file after Spark SQL is started, for example, by running add jar /opt/test/two_udfs.jar. The path specified by add jar can be a local or an HDFS path.

Scenario 2:

The show functions command obtains all functions in the current database from the external catalog. When a function is used in SQL, JDBCServer loads the JAR file corresponding to the function.

If the JAR file does not exist, the function cannot be used. In this case, run the add jar command again.