更新时间:2024-11-29 GMT+08:00
Spark导出带有相同字段名的表,结果导出失败
问题
在Spark的spark-shell上执行如下代码失败:
val acctId = List(("49562", "Amal", "Derry"), ("00000", "Fred", "Xanadu")) val rddLeft = sc.makeRDD(acctId) val dfLeft = rddLeft.toDF("Id", "Name", "City") //dfLeft.show val acctCustId = List(("Amal", "49562", "CO"), ("Dave", "99999", "ZZ")) val rddRight = sc.makeRDD(acctCustId) val dfRight = rddRight.toDF("Name", "CustId", "State") //dfRight.show val dfJoin = dfLeft.join(dfRight, dfLeft("Id") === dfRight("CustId"), "outer") dfJoin.show dfJoin.repartition(1).write.format("com.databricks.spark.csv").option("delimiter", "\t").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("nullValue", "").save("/tmp/outputDir")
回答
Spark中对join语句重名字段做了判断,需要修改代码保证保存的数据中无重复字段。
父主题: Spark常见问题