Exporting the Doris Query Result Set
This section describes how to use the SELECT INTO OUTFILE command to export the Doris query result set to a specified storage system in a specified file format.
Notes and Constraints
- The export command does not check whether a file and the file path exist, whether a path is automatically created, or whether an existing file is overwritten. These items are determined by the semantics of the remote storage system.
- If an error occurs during the export, some exported files may remain on the remote storage system. Doris does not clear these files. You need to manually clear them.
- The timeout of the export command is the same as the timeout of the query. It can be set by SET query_timeout=xxx.
- For empty result query, there will be an empty file.
- File splitting ensures that a row of data is completely stored in a single file. Therefore, the file size is not exactly the same as the value of max_file_size.
- For functions that output invisible characters, such as BITMAP and HLL types, the result is displayed as \N, that is, NULL.
- Currently, the output type of some geographic information functions, such as ST_Point, is VARCHAR. These functions output garbled characters because the actual output values are encoded binary characters. For geographic functions, use ST_AsText for output.
Syntax
<query_stmt>
INTO OUTFILE "file_path"
[format_as]
[properties]
In the preceding command:
- query_stmt indicates the specific query statement.
- file_path indicates the path for storing the exported file.
- format_as indicates the format in which result is exported. The value can be CSV, PARQUET, CSV_WITH_NAMES, CSV_WITH_NAMES_AND_TYPES, or ORC. The default value is CSV.
Example of Exporting Query Results
- Export to HDFS
Export the simple query result to the hdfs://path/tmp/result.txt file in CSV format using Broker.
- Kerberos authentication is enabled for the cluster (in security mode)
SELECT * FROM example_db.test_export_tbl INTO OUTFILE "hdfs://192.168.67.78:25000/tmp/result_" FORMAT AS CSV PROPERTIES ( "broker.name" = "broker_192_168_67_78", "column_separator" = ",", "line_delimiter" = "\n", "max_file_size" = "100MB", "broker.hadoop.security.authentication" = "kerberos", "broker.kerberos_principal" = "doris/hadoop.hadoop.com@HADOOP.COM", "broker.kerberos_keytab" = "/home/omm/doris_keytab/doris.keytab", "broker.dfs.nameservices" = "hacluster", "broker.dfs.ha.namenodes.hacluster" = "37,36", "broker.dfs.namenode.rpc-address.hacluster.37" = "IP address of the active NameNode instance:RPC port number", "broker.dfs.namenode.rpc-address.hacluster.36" = "IP address of the standby NameNode instance:RPC port number", "broker.dfs.client.failover.proxy.provider.hacluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" );
- Kerberos authentication is disabled for the cluster (in normal mode)
SELECT * FROM example_db.test_export_tbl INTO OUTFILE "hdfs://192.168.67.78:25000/tmp/result_" FORMAT AS CSV PROPERTIES ( "broker.name" = "broker_192_168_67_78", "column_separator" = ",", "line_delimiter" = "\n", "max_file_size" = "100MB", "broker.username"="hdfs", "broker.password"="", "broker.dfs.nameservices" = "hacluster", "broker.dfs.ha.namenodes.hacluster" = "37,36", "broker.dfs.namenode.rpc-address.hacluster.37" = "IP address of the active NameNode instance:RPC port number", "broker.dfs.namenode.rpc-address.hacluster.36" = "IP address of the standby NameNode instance:RPC port number", "broker.dfs.client.failover.proxy.provider.hacluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" );
- Kerberos authentication is enabled for the cluster (in security mode)
- Export to Local File
Before exporting data to a local file, you need to configure enable_outfile_to_local=true in the fe.conf file.
select * from tbl1 limit 10 INTO OUTFILE "file:///home/work/path/result_";
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.