Help Center> MapReduce Service> User Guide (Paris Region)> Troubleshooting> Using Hive> Task Execution Fails Because the Input File Number Exceeds the Threshold
Updated on 2022-12-14 GMT+08:00

Task Execution Fails Because the Input File Number Exceeds the Threshold

Symptom

When Hive performs a query operation, error message "Job Submission failed with exception 'java.lang.RuntimeException(input file number exceeded the limits in the conf;input file num is: 2380435,max heap memory is: 16892035072,the limit conf is: 500000/4)'" is displayed. The value in the error message varies depending on the actual situation. The error details are as follows:

ERROR : Job Submission failed with exception 'java.lang.RuntimeException(input file numbers exceeded the limits in the conf;
 input file num is: 2380435 ,
 max heap memory is: 16892035072 ,
 the limit conf is: 500000/4)'
java.lang.RuntimeException: input file numbers exceeded the limits in the conf;
 input file num is: 2380435 ,
 max heap memory is: 16892035072 ,
 the limit conf is: 500000/4
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.checkFileNum(ExecDriver.java:545)
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:430)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1965)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1723)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1475)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1283)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1278)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:167)
	at org.apache.hive.service.cli.operation.SQLOperation.access$200(SQLOperation.java:75)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:245)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:258)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)

Cause Analysis

MRS uses the ratio of maximum files to the maximum HiveServer heap memory to determine the number of input files allowed in a MapReduce job submission. Default value 500000/4 indicates that each 4 GB of heap memory allows a maximum of 500,000 input files. An error occurs if the number of input files exceeds this limit.

Solution

  1. Go to the Hive configuration page.

    • For versions earlier than MRS 2.0.1: Log in to MRS Manager, choose Services > Hive > Service Configuration, and select All from the Basic drop-down list.
    • For MRS 2.0.1 or later: Click the cluster name on the MRS console, choose Components > Hive > Service Configuration, and select All from the Basic drop-down list.

      If the Components tab is unavailable, complete IAM user synchronization first. (On the Dashboard page, click Synchronize on the right side of IAM User Sync to synchronize IAM users.)

    • For MRS 3.x or later: Log in to FusionInsight Manager and choose Cluster. Click the name of the target cluster and choose Services > Hive > Configurations > All Configurations.

  1. Search for hive.mapreduce.input.files2memory and set it to a proper value based on the actual memory and task.
  2. Save the configuration and restart the affected services or instances.
  3. If the fault persists, adjust the GC parameter of the HiveServer based on service requirements.