Help Center/ MapReduce Service/ Troubleshooting/ Using Spark/ Why Does Spark Fail to Export a Table with the Duplicate Field Names?
Updated on 2025-08-21 GMT+08:00

Why Does Spark Fail to Export a Table with the Duplicate Field Names?

Question

The following code fails to be executed on the Spark Shell:

val acctId = List(("49562", "Amal", "Derry"), ("00000", "Fred", "Xanadu"))
val rddLeft = sc.makeRDD(acctId)
val dfLeft = rddLeft.toDF("Id", "Name", "City")
//dfLeft.show
val acctCustId = List(("Amal", "49562", "CO"), ("Dave", "99999", "ZZ"))
val rddRight = sc.makeRDD(acctCustId)
val dfRight = rddRight.toDF("Name", "CustId", "State")
//dfRight.show
val dfJoin = dfLeft.join(dfRight, dfLeft("Id") === dfRight("CustId"), "outer")
dfJoin.show
dfJoin.repartition(1).write.format("com.databricks.spark.csv").option("delimiter", "\t").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("nullValue", "").save("/tmp/outputDir") 

Answer

Exporting a table with duplicate field names from Spark will fail.

Spark enforces unique field names in join statements. You need to modify the code to ensure that no duplicate fields exist in the saved data.