Optimizing INSERT...SELECT Operation

The INSERT...SELECT operation can be optimized in the following scenarios:

The INSERT...SELECT operation can be optimized as follows:

When creating a Hive table, set the storage type to Parquet to accelerate execution of the INSERT...SELECT statement.
Use spark-sql or a Spark user in beeline/thriftserver mode to execute INSERT...SELECT operations. This eliminates the need for changing the file owner, which quickens INSERT...SELECT statement execution.

In beeline/thriftserver mode, an executor and a driver are run by the same user. Because a driver is a part of ThriftServer and ThriftServer is run by a Spark user, the driver is also run by the Spark user. At present, the user of the beeline client cannot be transparently transmitted to the executor during operation. If a non-Spark user is used, the owner of a file must be changed to the user of the beeline client, that is, the actual user.