Help Center/
    
      
      Data Lake Insight/
      
      
        
        
        FAQs/
        
        
        Spark Jobs/
        
        
        Spark Job Development/
        
      
      How Do I Read Uploaded Files for a Spark Jar Job?
    
  
  
    
        Updated on 2024-11-15 GMT+08:00
        
          
          
        
      
      
      
      
      
      
      
      
  
      
      
      
        
How Do I Read Uploaded Files for a Spark Jar Job?
You can use SparkFiles to read the file submitted using –-file form a local path: SparkFiles.get("Name of the uploaded file").
 
  - The file path in the Driver is different from that obtained by the Executor. The path obtained by the Driver cannot be passed to the Executor.
 - You still need to call SparkFiles.get("filename") in Executor to obtain the file path.
 - The SparkFiles.get() method can be called only after Spark is initialized.
 
  Figure 1 Adding other dependencies
  
 
 
 The java code is as follows:
package main.java
 
import org.apache.spark.SparkFiles
import org.apache.spark.sql.SparkSession
 
import scala.io.Source
 
object DliTest {
  def main(args:Array[String]): Unit = {
    val spark = SparkSession.builder
      .appName("SparkTest")
      .getOrCreate()
 
    // Driver: obtains the uploaded file.
    println(SparkFiles.get("test"))
 
    spark.sparkContext.parallelize(Array(1,2,3,4))
         // Executor: obtains the uploaded file.
      .map(_ => println(SparkFiles.get("test")))
      .map(_ => println(Source.fromFile(SparkFiles.get("test")).mkString)).collect()
  }
}
   Parent topic: Spark Job Development
  
 Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
                The system is busy. Please try again later.
                
            
        For any further questions, feel free to contact us through the chatbot.
Chatbot