Help Center/ Data Lake Insight/ FAQs/ Spark Jobs/ Spark Job Development/ How Do I Read Uploaded Files for a Spark Jar Job?
Updated on 2024-11-15 GMT+08:00

How Do I Read Uploaded Files for a Spark Jar Job?

You can use SparkFiles to read the file submitted using –-file form a local path: SparkFiles.get("Name of the uploaded file").

  • The file path in the Driver is different from that obtained by the Executor. The path obtained by the Driver cannot be passed to the Executor.
  • You still need to call SparkFiles.get("filename") in Executor to obtain the file path.
  • The SparkFiles.get() method can be called only after Spark is initialized.
Figure 1 Adding other dependencies

The java code is as follows:

package main.java
 
import org.apache.spark.SparkFiles
import org.apache.spark.sql.SparkSession
 
import scala.io.Source
 
object DliTest {
  def main(args:Array[String]): Unit = {
    val spark = SparkSession.builder
      .appName("SparkTest")
      .getOrCreate()
 
    // Driver: obtains the uploaded file.
    println(SparkFiles.get("test"))
 
    spark.sparkContext.parallelize(Array(1,2,3,4))
         // Executor: obtains the uploaded file.
      .map(_ => println(SparkFiles.get("test")))
      .map(_ => println(Source.fromFile(SparkFiles.get("test")).mkString)).collect()
  }
}