Configuring the Maximum Number of Maps for Hive Tasks

Scenarios

In Hive tasks, the number of Map tasks directly affects query performance. Typically, Hive automatically calculates the number of Map tasks based on the size of input data and the HDFS block size. However, in some scenarios, manual adjustment is required:

Data skew: Large files can cause a single Map task to take too long to process.
Resource optimization: When cluster resources are limited, the number of concurrent Map tasks needs to be restricted.
Task scheduling: Controlling the granularity of Map tasks to avoid excessive small tasks consuming resources.

This section guides users on how to set a maximum number of Map tasks for Hive tasks on the server side by customizing parameters, thereby preventing performance issues caused by HiveServer overload.

Procedure

Log in to FusionInsight Manager. Click Cluster, choose Services > Hive, click Configurations, and then All Configurations.
Choose MetaStore(Role) > Customization, add a customized parameter to the hivemetastore-site.xml parameter file, set Name to hive.mapreduce.per.task.max.splits, and set the parameter to a large value.
Click Save. Click Instances, select all Hive instances, click More then Restart Instance, enter the user password, and click OK to restart all Hive instances.