Task Execution Fails Because the Input File Number Exceeds the Threshold

Symptom

When Hive performs a query operation, error message "Job Submission failed with exception 'java.lang.RuntimeException(input file number exceeded the limits in the conf;input file num is: 2380435,max heap memory is: 16892035072,the limit conf is: 500000/4)'" is displayed. The value in the error message varies depending on the actual situation. The error details are as follows:

ERROR : Job Submission failed with exception 'java.lang.RuntimeException(input file numbers exceeded the limits in the conf;
 input file num is: 2380435 ,
 max heap memory is: 16892035072 ,
 the limit conf is: 500000/4)'
java.lang.RuntimeException: input file numbers exceeded the limits in the conf;
 input file num is: 2380435 ,
 max heap memory is: 16892035072 ,
 the limit conf is: 500000/4
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.checkFileNum(ExecDriver.java:545)
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:430)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1965)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1723)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1475)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1283)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1278)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:167)
	at org.apache.hive.service.cli.operation.SQLOperation.access$200(SQLOperation.java:75)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:245)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:258)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)

Cause Analysis

MRS uses the ratio of maximum files to the maximum HiveServer heap memory to determine the number of input files allowed in a MapReduce job submission. Default value 500000/4 indicates that each 4 GB of heap memory allows a maximum of 500,000 input files. An error occurs if the number of input files exceeds this limit.

Solution

Search for hive.mapreduce.input.files2memory and set it to a proper value based on the actual memory and task.
Save the configuration and restart the affected services or instances.
If the fault persists, adjust the GC parameter of the HiveServer based on service requirements.

Parent topic: Using Hive

Previous topic: beeline Reports the "OutOfMemoryError" Error

Next topic: Task Execution Fails Because of Stack Memory Overflow