Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ HDFS Troubleshooting/ "ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat
Updated on 2024-12-11 GMT+08:00

"ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat

Question

When HDFS invokes the getSplit method of FileInputFormat, "ArrayIndexOutOfBoundsException: 0" is displayed. The log is as follows:

java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708)
at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:210)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)

Answer

The rack information of each block is in the format of /default/rack0/:,/default/rack0/datanodeip:port.

Blocks are damaged or lost. As a result, the IP address and port number of the host corresponding to the blocks are empty. To handle this problem, use hdfs fsck to check the health status of the file blocks, delete the damaged or lost blocks, and run task again.