Switching the Hive Execution Engine
Scenarios
The Hive execution engine is responsible for processing HQL queries by translating them into tasks for the underlying computing framework (execution engine). It orchestrates task execution to ensure efficient data processing.
Hive supports multiple execution engines such as MapReduce, Tez, and Spark to process HQL queries. The default engine is Tez. (For versions earlier than MRS 3.1.2-LTS, MapReduce is the default engine.) You can choose an execution engine based on performance needs, resource availability, and service requirements. Table 1 describes the engines.
Execution Engine |
Description |
Applicable Scenario |
---|---|---|
MapReduce |
The default execution engine in Hive versions prior to MRS 3.1.2-LTS. It is built on the Hadoop MapReduce framework, offering strong stability and compatibility. However, it translates queries into multiple Map and Reduce phases, requiring intermediate data to be written to the disk repeatedly. This leads to low resource utilization. |
MapReduce is best suited for simple batch processing tasks and scenarios that are not sensitive to latency. |
Tez |
The default execution engine in Hive versions starting from MRS 3.1.2-LTS. Tez improves performance by combining multiple MapReduce tasks into a directed acyclic graph (DAG), minimizing intermediate disk writes to the disk and reducing task overhead. It offers significantly better efficiency than MapReduce and is well-suited for complex queries. However, its DAG-based execution model can make debugging more challenging due to increased structural complexity. |
Tez is well-suited for complex queries and tasks involving multiple stages of data processing. |
Spark |
An execution engine based on the Spark framework. Spark caches data in memory, making it ideal for iterative computing tasks. It supports both Hive SQL and Spark SQL, and integrates seamlessly with the broader Spark ecosystem. It requires substantial memory resources and careful resource configurations. For small datasets, performance may degrade due to scheduling overhead. |
It is suitable for iterative tasks and complex data analysis scenarios. |
Prerequisites
If you need to switch the execution engine to Tez and view task execution on the Tez web UI, ensure that the TimelineServer role of the YARN service has been installed in the cluster and the role is running properly.
Switching the Execution Engine on the Client
- Install and log in to the Hive client. For details, see Using the Hive Client.
- Run the following command to check the Hive execution engine:
set hive.execution.engine;
For example, if the command output is as follows, the Hive execution engine is Tez:
+---------------------------+ | set | +---------------------------+ | hive.execution.engine=tez | +---------------------------+ 1 row selected (0.258 seconds)
- Run the following command to switch the Hive execution engine. The execution engine can be set to MapReduce, Spark, or Tez. The corresponding parameter values are mr, spark, and tez.
set hive.execution.engine=mr;
For MRS 3.1.2-LTS, if the execution engine needs to be set to Tez, you also need to enable the yarn.timeline-service.enabled parameter.
set yarn.timeline-service.enabled=true;
Note that:
- After yarn.timeline-service.enabled is enabled, you can view the details about the tasks executed by the Tez engine on the Tez UI. After this function is enabled, task information will be reported to TimelineServer. If the TimelineServer instance is faulty, the task will fail.
- Tez uses the ApplicationMaster buffer pool. Therefore, yarn.timeline-service.enabled must be enabled before Tez tasks are submitted. Otherwise, this parameter cannot take effect and you need to log in to the client again to configure it.
- When the execution engine needs to be switched to another one, you need to run the following command on the client to disable the yarn.timeline-service.enabled parameter.
set yarn.timeline-service.enabled=false;
- (Optional) To specify a running YARN queue, run the following command on the client:
set tez.queue.name=default;
- Submit and execute a Hive task.
- After the task is executed, log in to FusionInsight Manager, choose Cluster > Services > Yarn, and click the hyperlink on the right of ResourceManager Web UI to access the YARN web UI. Then, you can view the task execution status. To view task execution status:
- If the execution engine is Spark, log in to FusionInsight Manager, choose Cluster > Services > Spark, and click the hyperlink on the right of Spark Web UI to access the Spark web UI.
- If the execution engine is Tez, log in to FusionInsight Manager, choose Cluster > Services > Tez, and click the hyperlink on the right of Tez Web UI to access the Tez web UI.
Configuring the Default Execution Engine of Hive
- Log in to FusionInsight Manager, choose Cluster > Services > Hive, click Configurations and then All Configurations, click HiveServer(Role), and search for hive.execution.engine.
- Set hive.execution.engine to the required execution engine.
- mr: Set the execution engine of Hive to MapReduce.
- spark: Set the execution engine of Hive to Spark.
- tez: Set the execution engine of Hive to Tez. This is the default value.
In MRS 3.1.2-LTS, if the execution engine is changed from another one to Tez, you also need to choose Hive(Service) > Customization, search for yarn.site.customized.configs, add custom parameter yarn.timeline-service.enabled, and set its value to true.
- After yarn.timeline-service.enabled is enabled, you can view the details about the tasks executed by the Tez engine on the Tez UI. After this function is enabled, task information will be reported to TimelineServer. If the TimelineServer instance is faulty, the task will fail.
- Tez uses the ApplicationMaster buffer pool. Therefore, yarn.timeline-service.enabled must be enabled before Tez tasks are submitted. Otherwise, this parameter cannot take effect and you need to log in to the client again to configure it.
- When the execution engine needs to be changed to another one, you need to set the value of parameter yarn.timeline-service.enabled to false.
- Click Save. In the displayed confirmation dialog box, click OK.
- Choose Dashboard. On the displayed page, choose More > Restart Service, enter the password of the current user, and click OK to restart the Hive service.
- Install and log in to the Hive client. For details, see Using the Hive Client.
- Submit and execute a Hive task.
- After the task is executed, log in to FusionInsight Manager, choose Cluster > Services > Yarn, and click the hyperlink on the right of ResourceManager Web UI to access the YARN web UI. Then, you can view the task execution status. To view task execution status:
- If the execution engine is Spark, log in to FusionInsight Manager, choose Cluster > Services > Spark, and click the hyperlink on the right of Spark Web UI to access the Spark web UI.
- If the execution engine is Tez, log in to FusionInsight Manager, choose Cluster > Services > Tez, and click the hyperlink on the right of Tez Web UI to access the Tez web UI.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot