Enhancing beeline Reliability

Scenario

When the beeline client is disconnected due to network exceptions during the execution of a batch processing task, tasks submitted before beeline is disconnected can be properly executed in Hive. When you start the batch processing task again, the submitted tasks are not executed and tasks that are not executed are executed in sequence.
When the HiveServer service breaks down due to some reasons during the execution of a batch processing task, Hive enables that the tasks that have been successfully executed are not executed again when the same batch processing task is started again. The execution starts from the task that has not been executed from the time when HiveServer2 breaks down.

This section applies to MRS 3.x or later.

Example

Beeline is reconnected after being disconnection.
Example:

beeline -e "${SQL}" --hivevar batchid=xxxxx
Beeline kills the running tasks.
Example:

beeline -e "" --hivevar batchid=xxxxx --hivevar kill=true
Log in to the beeline client and start the mechanism of reconnection after disconnection.
Log in to the beeline client and run the set hivevar:batchid=xxxx command.
Instructions:
- xxxx indicates the batch ID of tasks submitted in the same batch using the beeline client. Batch IDs can be used to identify the task submission batch. If the batch ID is not contained when a task is submitted, this feature is not enabled. The value of xxxx is specified during task execution. In the following example, the value of xxxx is 012345678901.
  beeline -f hdfs://hacluster/user/hive/table.sql --hivevar batchid=012345678901
- If the running SQL script depends on the data timeliness, you are advised not to enable the breakpoint reconnection mechanism. You can use a new batch ID to submit tasks. During reexecution of the scripts, some SQL statements have been executed and are not executed again. As a result, expired data is obtained.
- If some built-in time functions are used in the SQL script, it is recommended that you do not enable the breakpoint reconnection mechanism or the use of a new batch ID for each execution. The reason is the same as above.
- A SQL script contains one or more subtasks. If the logic for deleting and creating temporary tables exist in the SQL script, it is recommended that the logic for deleting temporary tables be placed at the end of the script. If the subtasks executed after the temporary table deletion task fail to be executed and the temporary table is used in the subtasks before the temporary table deletion task, when the SQL script is executed using the same batch ID for the next time, the compilation of the subtasks (excluding the task for creating the temporary table because the creation has been completed and is not executed again, and only compilation is allowed) executed before the temporary table deletion task fails because the temporary has been deleted. In this case, you are advised to use a new batch ID to execute the script.
Parameter description:
- zk.cleanup.finished.job.interval: indicates the interval for executing the cleanup task. The default interval is 60 seconds.
- zk.cleanup.finished.job.outdated.threshold: indicates the threshold of the node validity period. A node is generated for tasks in the same batch. The threshold is calculated from the end time of the execution of the current batch task. If the time exceeds 60 minutes, the node is deleted.
- batch.job.max.retry.count: indicates the maximum number of retry times of a batch task. If the number of retry times of a batch task exceeds the value of this parameter, the task execution record is deleted. The task will be executed from the first task when the task is started next time. The default value is 10.
- beeline.reconnect.zk.path: indicates the root node for storing task execution progress. The default value for the Hive service is /beeline.