Updated on 2026-01-09 GMT+08:00

Exporting Search Results

Function

The INSERT OVERWRITE DIRECTORY statement is used to directly write query results to a specified directory, supporting storage in CSV, Parquet, ORC, JSON, or Avro format.

Syntax

1
2
3
4
INSERT OVERWRITE DIRECTORY path
  USING file_format
  [OPTIONS(key1=value1)]
  select_statement;

Keywords

  • USING: Specifies the storage format.
  • OPTIONS: Specifies the list of attributes to be exported. This parameter is optional.

Parameter

Table 1 INSERT OVERWRITE DIRECTORY parameters

Parameter

Description

path

The OBS path to which the query result is to be written.

file_format

Format of the file to be written. The value can be CSV, Parquet, ORC, JSON, or Avro.

If file_format is set to csv, see Table 3 for the OPTIONS parameters.

Precautions

  • By configuring the spark.sql.shuffle.partitions parameter, you can set the number of files inserted into the OBS bucket for non-DLI tables. Additionally, to avoid data skew, you can append distribute by rand() after the INSERT statement to increase the concurrency of processing jobs. The following is an example:
    insert into table table_target select * from table_source distribute by cast(rand() * N as int);
  • When the configuration item is OPTIONS('DELIMITER'=','), you can specify a separator. The default value is ,.

    For CSV data, the following delimiters are supported:

    • Tab character, for example, 'DELIMITER'='\t'.
    • You can specify a delimiter using Unicode encoding, for example: 'DELIMITER'='\u0001'.
    • Single quotation mark ('). A single quotation mark must be enclosed in double quotation marks (" "). For example, 'DELIMITER'= "'".

Example

1
2
3
4
INSERT OVERWRITE DIRECTORY 'obs://bucket/dir'
  USING csv
  OPTIONS(key1=value1)
  select * from db1.tb1;