Help Center/ MapReduce Service/ Developer Guide (Normal_Earlier Than 3.x)/ Spark Development Guide/ FAQs About Spark Application Development/ Spark Application Tuning/ SQL and DataFrame Tuning/ Optimizing INSERT...SELECT Operation

Updated on 2022-06-01 GMT+08:00

View PDF

Optimizing INSERT...SELECT Operation

Scenario

The INSERT...SELECT operation can be optimized in the following scenarios:

Data in a large number of small files is queried.
Data in large files is queried.
A non-Spark user is used in beeline/thriftserver mode.

Procedure

The INSERT...SELECT operation can be optimized as follows:

When creating a Hive table, set the storage type to Parquet to accelerate execution of the INSERT...SELECT statement.
Use spark-sql or a Spark user in beeline/thriftserver mode to execute INSERT...SELECT operations. This eliminates the need for changing the file owner, which quickens INSERT...SELECT statement execution.

In beeline/thriftserver mode, an executor and a driver are run by the same user. Because a driver is a part of ThriftServer and ThriftServer is run by a Spark user, the driver is also run by the Spark user. At present, the user of the beeline client cannot be transparently transmitted to the executor during operation. If a non-Spark user is used, the owner of a file must be changed to the user of the beeline client, that is, the actual user.

Parent topic: SQL and DataFrame Tuning

Previous topic: Optimizing the Spark SQL Join Operation

Next topic: Spark Streaming Tuning

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.