Exporting Search Results
Function
The INSERT OVERWRITE DIRECTORY statement is used to directly write query results to a specified directory, supporting storage in CSV, Parquet, ORC, JSON, or Avro format.
Syntax
1 2 3 4 |
INSERT OVERWRITE DIRECTORY path USING file_format [OPTIONS(key1=value1)] select_statement; |
Keywords
- USING: Specifies the storage format.
- OPTIONS: Specifies the list of attributes to be exported. This parameter is optional.
Parameter
|
Parameter |
Description |
|---|---|
|
path |
The OBS path to which the query result is to be written. |
|
file_format |
Format of the file to be written. The value can be CSV, Parquet, ORC, JSON, or Avro. |
Precautions
- By configuring the spark.sql.shuffle.partitions parameter, you can set the number of files inserted into the OBS bucket for non-DLI tables. Additionally, to avoid data skew, you can append distribute by rand() after the INSERT statement to increase the concurrency of processing jobs. The following is an example:
insert into table table_target select * from table_source distribute by cast(rand() * N as int);
- When the configuration item is OPTIONS('DELIMITER'=','), you can specify a separator. The default value is ,.
For CSV data, the following delimiters are supported:
- Tab character, for example, 'DELIMITER'='\t'.
- You can specify a delimiter using Unicode encoding, for example: 'DELIMITER'='\u0001'.
- Single quotation mark ('). A single quotation mark must be enclosed in double quotation marks (" "). For example, 'DELIMITER'= "'".
Example
1 2 3 4 |
INSERT OVERWRITE DIRECTORY 'obs://bucket/dir' USING csv OPTIONS(key1=value1) select * from db1.tb1; |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot
