Help Center/ MapReduce Service/ Component Operation Guide (LTS) (Ankara Region)/ Using Spark/ Spark FAQ/ Why Does Spark Fail to Export a Table with Duplicate Field Names?
Updated on 2024-11-29 GMT+08:00

Why Does Spark Fail to Export a Table with Duplicate Field Names?

Question

The following code fails to execute on spark-shell of Spark:

val acctId = List(("49562", "Amal", "Derry"), ("00000", "Fred", "Xanadu"))
val rddLeft = sc.makeRDD(acctId)
val dfLeft = rddLeft.toDF("Id", "Name", "City")
//dfLeft.show
val acctCustId = List(("Amal", "49562", "CO"), ("Dave", "99999", "ZZ"))
val rddRight = sc.makeRDD(acctCustId)
val dfRight = rddRight.toDF("Name", "CustId", "State")
//dfRight.show
val dfJoin = dfLeft.join(dfRight, dfLeft("Id") === dfRight("CustId"), "outer")
dfJoin.show
dfJoin.repartition(1).write.format("com.databricks.spark.csv").option("delimiter", "\t").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("nullValue", "").save("/tmp/outputDir") 

Answer

In Spark, check whether there are duplicate field names in join statements. If so, modify the code to ensure there is no duplicate field name in the table.