Exporting Search Results
Function
This statement is used to directly write query results to a specified directory. The query results can be stored in CSV, Parquet, ORC, JSON, or Avro format.
Syntax
1 2 3 4 |
INSERT OVERWRITE DIRECTORY path
USING file_format
[OPTIONS(key1=value1)]
select_statement;
|
Keyword
- USING: Specifies the storage format.
- OPTIONS: Specifies the list of attributes to be exported. This parameter is optional.
Parameter
Parameter |
Description |
---|---|
path |
The OBS path to which the query result is to be written. |
file_format |
Format of the file to be written. The value can be CSV, Parquet, ORC, JSON, or Avro. |
If the file format is set to CSV, see the Table 3 for the OPTIONS parameters.
Precautions
- You can configure the spark.sql.shuffle.partitions parameter to set the number of files to be inserted into the OBS bucket in the non-DLI table. In addition, to avoid data skew, you can add distribute by rand() to the end of the INSERT statement to increase the number of concurrent jobs. The following is an example:
insert into table table_target select * from table_source distribute by cast(rand() * N as int);
- When the configuration item is OPTIONS('DELIMITER'=','), you can specify a separator. The default value is ,.
For CSV data, the following delimiters are supported:
- Tab character, for example, 'DELIMITER'='\t'.
- Any binary character, for example, 'DELIMITER'='\u0001(^A)'.
- Single quotation mark ('). A single quotation mark must be enclosed in double quotation marks (" "). For example, 'DELIMITER'= "'".
- \001(^A) and \017(^Q) are also supported, for example, 'DELIMITER'='\001(^A)' and 'DELIMITER'='\017(^Q)'.
Example
1 2 3 4 |
INSERT OVERWRITE DIRECTORY 'obs://bucket/dir'
USING csv
OPTIONS(key1=value1)
select * from db1.tb1;
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.