Help Center/ MapReduce Service/ Troubleshooting/ Using Spark/ Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell

Updated on 2023-11-30 GMT+08:00

View PDF

Failed to Load Data to a Hive Table Across File Systems by Running SQL Statements Using Spark Shell

Issue

When the spark-shell command is used to execute SQL statements or the spark-submit command is used to submit Spark tasks, the load command of SQL statements exists, and the source data and target table are not stored in the same file system. An error is reported when the MapReduce task is started in the preceding two modes.

Cause Analysis

When the load command is used to import data to the Hive table across file systems (for example, the original data is stored in HDFS but the Hive table data is stored in OBS), and the file length is greater than the threshold (32 MB by default). In this case, the MapReduce task that uses DistCp is triggered to migrate data. The MapReduce task configuration is directly extracted from the Spark task configuration. However, the net.topology.node.switch.mapping.impl configuration item of the Spark task does not retain the default value of Hadoop. Therefore, the JAR package of Spark needs to be used. As a result, MapReduce reports an error indicating that the class cannot be found.

Procedure

Solution 1:

If the file size is small, set the default file size to a value greater than the maximum file size. For example, if the maximum file size is 95 MB, run the following command:

hive.exec.copyfile.maxsize=104857600

Solution 2:

If the file size is large, use DistCp to improve the data migration efficiency. Add the following parameters when starting the Spark task:

--conf spark.hadoop.net.topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping

Parent topic: Using Spark

Previous topic: Large Number of Shuffle Results Are Lost During Spark Task Execution

Next topic: Spark Task Submission Failure

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot