Why Are Some Functions Not Available when Another JDBCServer Is Connected?

Question

Scenario 1

I set up permanent functions using the add jar statement. After Beeline connects to different JDBCServer or JDBCServer is restarted, I have to run the add jar statement again.

Figure 1 Error information in scenario 1
Click to enlarge

Scenario 2

The show functions statement can be used to query functions, but not obtain functions. The reason is that connected JDBC node does not contain jar packages of the corresponding path. However, after I add corresponding .jar packages, the show functions statement can be used to obtain functions.

Figure 2 Error information in scenario 2
Click to enlarge

Answer

Scenario 1

The add jar statement is used to load jars to the jarClassLoader of the JDBCServer connected currently. The add jar statement is not shared by different JDBCServer. After the JDBCServer restarts, new jarClassLoader is created. So the add jar statement needs to be run again.

There are two methods to add jar packages: You can run the spark-sql --jars /opt/test/two_udfs.jar statement to add the jar package during the startup of the Spark SQL process; or run the add jar /opt/test/two_udfs.jar statement to add the jar package after the Spark SQL process is started. Note that the path following the add jar statement can be a local path or an HDFS path.

Scenario 2

The show functions statement is used to obtain all functions in the current database from the external catalog. If functions are used in SQL, thriftJDBC-server loads .jar files related to the function.

If .jar files do not exist, the function cannot obtain corresponding .jar files. Therefore, the corresponding .jar files need to be added.

Parent topic: Spark SQL and DataFrame

Previous topic: Why Is Memory Insufficient if 10 Terabytes of TPCDS Test Suites Are Consecutively Run in Beeline/JDBCServer Mode?

Next topic: Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5?