Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ "ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat
Updated on 2025-08-19 GMT+08:00

"ArrayIndexOutOfBoundsException: 0" Occurs When HDFS Invokes getsplit of FileInputFormat

Issue

When HDFS invokes the getSplit method of FileInputFormat, "ArrayIndexOutOfBoundsException: 0" is displayed. The log is as follows:

java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708)
at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:210)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)

Procedure

The rack information of each block is in the format of /default/rack0/:,/default/rack0/datanodeip:port.

Blocks are damaged or lost. As a result, the IP address and port number of the host corresponding to the blocks are empty. To handle this problem, use hdfs fsck to check the health status of the file blocks, delete the damaged or lost blocks, and run task again.