"ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat
Question
When HDFS invokes the getSplit method of FileInputFormat, "ArrayIndexOutOfBoundsException: 0" is displayed. The log is as follows:
java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708) at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:210) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
Answer
The rack information of each block is in the format of /default/rack0/:,/default/rack0/datanodeip:port.
Blocks are damaged or lost. As a result, the IP address and port number of the host corresponding to the blocks are empty. To handle this problem, use hdfs fsck to check the health status of the file blocks, delete the damaged or lost blocks, and run task again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.