Help Center> MapReduce Service> Developer Guide (2.x and Earlier)> Spark Application Development> Application Tuning> SQL and DataFrame Tuning> Optimizing INSERT...SELECT Operation

Optimizing INSERT...SELECT Operation

Scenario

The INSERT...SELECT operation can be optimized in the following scenarios:

Data in a large number of small files is queried.
Data in large files is queried.
A non-Spark user is used in beeline/thriftserver mode.

Procedure

The INSERT...SELECT operation can be optimized as follows:

When creating a Hive table, set the storage type to Parquet to accelerate execution of the INSERT...SELECT statement.
Use spark-sql or a Spark user in beeline/thriftserver mode to execute INSERT...SELECT operations. This eliminates the need for changing the file owner, which quickens INSERT...SELECT statement execution.

In beeline/thriftserver mode, an executor and a driver are run by the same user. Because a driver is a part of ThriftServer and ThriftServer is run by a Spark user, the driver is also run by the Spark user. At present, the user of the beeline client cannot be transparently transmitted to the executor during operation. If a non-Spark user is used, the owner of a file must be changed to the user of the beeline client, that is, the actual user.

Parent topic: SQL and DataFrame Tuning

Last Article: Optimizing the Spark SQL Join Operation

Next Article: Spark Streaming Tuning

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Optimizing INSERT...SELECT Operation

Scenario

Procedure