Accessing Spark SQL Sample Projects Through JDBC (Scala)
Function
The JDBC API of the user-defined client is used to submit a data analysis task and return the results.
Example Codes
- Define an SQL statement. SQL must be a single statement and cannot contain ";". The following is an example.
val sqlList = new ArrayBuffer[String] sqlList += "CREATE TABLE CHILD (NAME STRING, AGE INT) " + "ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE" sqlList += "LOAD DATA INPATH 'hdfs://hacluster/home/data' INTO TABLE CHILD" sqlList += "SELECT * FROM child" sqlList += "DROP TABLE child"
- Assemble the JDBC URL.
val config: Configuration = new Configuration() config.addResource(new Path(args(0))) val zkUrl = config.get("spark.deploy.zookeeper.url") var zkNamespace: String = null zkNamespace = fileInfo.getProperty("spark.thriftserver.zookeeper.namespace") //Remove redundant characters from configuration items if (zkNamespace != null) zkNamespace = zkNamespace.substring(1) val sb = new StringBuilder("jdbc:hive2://" + zkUrl + "/;serviceDiscoveryMode=zooKeeper;" + "zooKeeperNamespace=" + zkNamespace + ";"); val url = sb.toString()
- Load the Hive JDBC driver. Obtain the JDBC connection, execute the HQL, export the obtained column name and results to the console, and close the JDBC connection.
The zk.quorum in the connection string can be replaced by spark.deploy.zookeeper.url in the configuration file.
In network congestion, configure the timeout of the connection between the client and JDBCServer to avoid the suspending of the client due to timeless wait of the return from the server. The method is as follows:
Before using the DriverManager.getConnection method to obtain the JDBC connection, add the DriverManager.setLoginTimeout(n) method to configure a timeout interval. n indicates the timeout interval for waiting for the return result from the server. The unit is second, the type is Int, and the default value is 0 (indicating never timing out).
def executeSql(url: String, sqls: Array[String]): Unit = { //Load the Hive JDBC driver. Class.forName("org.apache.hive.jdbc.HiveDriver").newInstance() var connection: Connection = null var statement: PreparedStatement = null try { connection = DriverManager.getConnection(url) for (sql <- sqls) { println(s"---- Begin executing sql: $sql ----") statement = connection.prepareStatement(sql) val result = statement.executeQuery() val resultMetaData = result.getMetaData val colNum = resultMetaData.getColumnCount for (i <- 1 to colNum) { print(resultMetaData.getColumnLabel(i) + "\t") } println() while (result.next()) { for (i <- 1 to colNum) { print(result.getString(i) + "\t") } println() } println(s"---- Done executing sql: $sql ----") } } finally { if (null != statement) { statement.close() } if (null != connection) { connection.close() } } }
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot