Help Center/ Data Warehouse Service/ More Documents/ SQL Syntax Reference (Paris Region)/ DDL Syntax/ CREATE FOREIGN TABLE (for OBS Import and Export)

Updated on 2025-10-11 GMT+08:00

CREATE FOREIGN TABLE (for OBS Import and Export)

Function

CREATE FOREIGN TABLE creates an OBS foreign table in the current database for parallel data import and export of OBS data. You do not need to create an external server. The gsmpp_server created by the database by default can be used.

This syntax supports only TEXT and CSV data on OBS buckets. If you need to process ORC, CARBONDATA, and PARQUET data, see CREATE FOREIGN TABLE (SQL on OBS or Hadoop) and create an external server.

A foreign table is a virtual table within a database or big data platform that does not physically store data. Instead, it serves as a mapping to data residing in external storage systems, such as HDFS or OBS. This mapping is established using metadata, which includes the table structure, the location of the data, and its file format. It enables users to directly access or operate external data using standard SQL statements. A foreign table serves as a data proxy that logically maps data in external file systems to table structures that can be identified by databases.

Precautions

Only the data in text and CSV formats is supported, and the OBS connection should be configured.
When using an OBS foreign table to access data in an OBS bucket, you need to ensure that your DWS cluster and OBS bucket are in the same region.
Foreign tables are classified into read-only foreign tables (READ ONLY) and write-only foreign tables (WRITE ONLY). By default, foreign tables are read-only. When importing OBS data, you need to set the foreign table to READ ONLY. When exporting data to OBS, you need to set the foreign table to WRITE ONLY.
Only the system administrator dbadmin or a common user who has been granted the USEFT permission can perform foreign table operations.
1

ALTER USER user_name USEFT;
The distribution mode of an OBS foreign table does not need to be explicitly specified. The default mode is ROUNDROBIN.
Only constraints in Informational Constraints take effect for the created foreign table.
OBS buckets cannot contain Chinese paths.

**Table 1** Read and write formats supported by OBS foreign tables
Data Type	DIST_FDW
-	READ ONLY	WRITE ONLY
ORC	×	×
PARQUET	×	×
CARBONDATA	×	×
TEXT	√	√
CSV	√	√
JSON	×	×

Syntax

     CREATE FOREIGN TABLE [ IF NOT EXISTS  ] table_name 
( { column_name type_name [column_constraint ]
    | LIKE source_table | table_constraint [, ...]} [, ...] ) 
SERVER server_name
OPTIONS (  { option_name ' value '  }  [, ...] ) 
[  { WRITE ONLY  |  READ ONLY  }] 
[ WITH error_table_name | LOG INTO error_table_name] 
[PER NODE REJECT LIMIT 'value']  ;
 
 
  

column_constraint is as follows:

     [CONSTRAINT constraint_name]
{PRIMARY KEY | UNIQUE}
[NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]

table_constraint is as follows:

     [CONSTRAINT constraint_name]
{PRIMARY KEY | UNIQUE} (column_name)
[NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]

Parameter Description

Table 2 CREATE FOREIGN TABLE (for OBS import and export) parameters

Parameter

Description

Value Range or Example

IF NOT EXISTS

Sends a notice, but does not throw an error, if a table with the same name exists.

table_name

Specifies the name of the foreign table.

A string, which must comply with the naming convention. For details, see Identifier Naming Conventions.

column_name

Specifies the name of a column in the foreign table.

A string, which must comply with the naming convention. For details, see Identifier Naming Conventions.

type_name

Specifies the data type of the column.

SERVER server_name

Specifies the server name of the foreign table. For OBS foreign tables used for data import and export, you can use gsmpp_server created by the initial database by default or use a customized server.

If a custom server is used, the foreign data wrapper should be dist_fdw.
For clusters of 8.2.0 and later versions, you can specify the following OBS access parameters in the customized dist_fdw server: access_key, secret_access_key, and security_token. If the preceding parameters are specified in the server, you do not need to specify them again in the foreign table.

OPTIONS

Specifies parameters of foreign table data.

Data format parameters. For details, see Table 3.
Error-tolerance parameters. For details, see Table 4.
Performance parameters. For details, see Table 5.

READ ONLY

Specifies whether a foreign table is read-only. This parameter is available only for data import.

WRITE ONLY

Specifies whether a foreign table is write-only. This parameter is available only for data export.

WITH error_table_name

Data format errors during import are recorded in the table specified by error_table_name. You can query this table after the import to obtain error details. This parameter is available only after reject_limit is set.

A string, which must comply with the naming convention.

CAUTION:

To be compatible with PostgreSQL open source interfaces, you are advised to replace this syntax with LOG INTO. When this parameter is specified, an error table is automatically created.

LOG INTO error_table_name

Data format errors during import are recorded in the table specified by error_table_name. You can query this table after the import to obtain error details.

A string, which must comply with the naming convention.

This parameter is available only after PER NODE REJECT LIMIT is set.
When this parameter is specified, an error table is automatically created.

PER NODE REJECT LIMIT 'value'

This parameter specifies the allowed number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.

An unlimited integer. If this parameter is not specified, an error message is returned immediately.

CAUTION:

This syntax specifies the error tolerance of a single node.

Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data import process stops.

NOT ENFORCED

Specifies that the created constraint is an informational constraint, which is not forcibly verified by the database. This option is used with ENABLE QUERY OPTIMIZATION.

Informational rather than mandatory: You still need to specify a unique constraint for a column using PRIMARY KEY or UNIQUE to optimize the query execution plan.
No mandatory verification: The database does not check whether the data in the external data source actually meets constraint conditions (for example, whether duplicate values exist).
Performance optimization: The query optimizer is allowed to use an informational constraint to generate a more efficient execution plan. However, you need to ensure that the data meets constraint conditions, or the query result may be incorrect.
Risk: If the actual data violates constraint conditions (for example, duplicate values exist), the query result may be incorrect (for example, the aggregation result is abnormal or redundant records are generated during JOIN operations).

For more information, see Informational Constraints.

For example, the primary key is declared, but the verification is not mandatory.

     CREATE FOREIGN TABLE hdfs_users ( user_id INT PRIMARY KEY NOT ENFORCED, 
...

ENFORCED

Specifies that the created constraint is an enforced constraint that is forcibly validated by the database. This parameter is reserved. Currently, DWS does not support ENFORCED.

PRIMARY KEY (column_name)

Specifies the informational constraint on column_name.

String, which must conform to the identifier naming conventions. The column name must exist.

ENABLE QUERY OPTIMIZATION

Enables the query optimizer to use informational constraints to generate a more efficient execution plan. This parameter is used together with NOT ENFORCED.

Create a foreign table, add a unique constraint to the p_partkey int column, do not enforce the constraint, and allow the optimizer to use the informational constraint to generate a better execution plan.

     CREATE FOREIGN TABLE ft_part 
(
     p_partkey int UNIQUE  NOT ENFORCED  ENABLE QUERY OPTIMIZATION, 
...

DISABLE QUERY OPTIMIZATION

Disables the query optimizer to use informational constraints to generate an efficient execution plan.

Table 3 OPTIONS parameters of the data format

Parameter

Description

Value Range or Example

encrypt

Specifies whether HTTPS is enabled for data transfer. on enables HTTPS and off disables it (in this case, HTTP is used). The default value is off.

access_key

Indicates the access key (AK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its AK value is not encrypted and saved to the metadata table of the database. The correctness of the parameter is not verified when a foreign table is created.

secret_access_key

Indicates the secret access key (SK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its SK value is encrypted and saved to the metadata table of the database. The correctness of the parameter is not verified when a foreign table is created.

security_token

Corresponds to the SecurityToken value of the temporary security credential in IAM. A temporary AK, a temporary SK, and a temporary security token form a temporary security credential. The temporary security credential is valid for no more than 24 hours. This parameter is supported by version 8.2.0 or later clusters.

CAUTION:

This parameter is supported by version 8.2.0 or later clusters.
When this parameter is used, access_key and secret_access_key correspond to the temporary AK and SK, respectively.

chunksize

Specifies the cache read by each OBS thread on a DN. Its value range is 8 to 512 in the unit of MB. Its default value is 64.

location

Specifies the directory where OBS data is stored. The value can be described as a URL in the format of obs://OBS bucket name/folder name. Multiple URLs are separated by vertical bars (|). Ensure that the OBS bucket and DWS cluster are in the same region. Cross-region access to OBS bucket data is not supported.

For details about how to use this parameter, see Location Parameter Description.

Example:

     OPTIONS 
( encoding 'utf8',         location 'obs://<obs_bucket_name>/traffic-data/gcxx',         
format 'text',
...

region

(Optional) specifies the value of regionCode, region information on the cloud.

If the region parameter is explicitly specified, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read.

Note the following when setting parameters for importing or exporting OBS foreign tables in TEXT or CSV format:

The location parameter is mandatory. The prefixes gsobs and obs indicate file locations on OBS. The gsobs prefix should be followed by obs url, bucket, and prefix. The obs prefix should be followed by bucket or prefix.
The data sources of multiple buckets are separated by vertical bars (|), for example, LOCATION 'obs://bucket1/folder/ | obs://bucket2/'. The database scans all objects in the specified folders.

format

Specifies the format of the source data file in a foreign table.

CSV or TEXT (default value). DWS only supports CSV and TEXT formats.

CSV (comma-separated format):
- The CSV file can process linefeeds efficiently, but cannot process certain special characters very well.
- A CSV file is composed of records that are separated as columns by delimiters. Each record shares the same column sequence.
TEXT (text format):
Records are separated as columns by linefeed. The TEXT file can process special characters efficiently, but cannot process linefeeds well.

header

Specifies whether a file contains a header with the names of each column in the file.

When OBS exports data, this parameter cannot be set to true. Use the default value false, indicating that the first row of the exported data file is not the header.

When data is imported, if header is on, the first row of the data file will be identified as title row and ignored. If header is off, the first row will be identified as a data row.

Valid value: true, on, false, and off. The default value is false or off.

delimiter

Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,).

The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.

The delimiter of TEXT cannot be \r or \n.
A delimiter cannot be the same as the null value. The delimiter for the CSV format cannot be same as the quote value.
The separator of TEXT data cannot contain letters, digits, backslashes (\), and periods (.).
The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
You are advised to use a multi-character string, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.

quote

Specifies the quotation mark for the CSV format. The default value is a double quotation mark (").

The quote value cannot be the same as the delimiter or null value.
The quote value must be a single-byte character.
Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.

escape

Specifies an escape character for a CSV file. The value must be a single-byte character.

The default value is a double quotation mark ("). If the value is the same as the quote value, it will be replaced with \0.

null

Specifies how to represent a null value.

The default value is \N for the TEXT format.
The default value for the CSV format is an empty string without quotation marks.
The null value cannot be \r or \n. The maximum length is 100 characters.
The null value cannot be the same as the delimiter or quote parameter.

noescaping

Specifies whether to escape the backslash (\) and its following characters in the TEXT format.

noescaping is available only for the TEXT format.

true/on or false/off. The default value is false or off.

encoding

Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database.

Before you import foreign tables, it is recommended that you set client_encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before exporting foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expect.

If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.

CAUTION:

Currently, OBS cannot parse a file using multiple character sets during foreign table import.
Currently, OBS cannot write a file using multiple character sets during foreign table export.

eol

Specifies the newline character style of the imported or exported data file.

Multi-character newline characters are supported, but the newline character cannot exceed 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.

The eol parameter supports only the TEXT format for data import and export.
The value of the eol parameter cannot be the same as that of DELIMITER or NULL.
The value of the eol parameter cannot contain digits, letters, or periods (.).

date_format

Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.

Valid DATE formats. For details, see Date and Time Processing Functions and Operators.

NOTE:

If ORACLE is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.

time_format

Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.

Valid TIME formats. Time zones are not supported.

timestamp_format

Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.

Valid TIMESTAMP formats. Time zones are not supported.

smalldatetime_format

Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.

Valid SMALLDATETIME formats.

bom

Indicates whether a CSV file contains the utf8 BOM.

This parameter is valid only when the foreign table is read-only and uses UTF8 code.

Value range: true, on, false, and off

Default value: false

**Table 4** OPTIONS fault tolerance parameters
Parameter	Description	Value Range
fill_missing_fields	Specifies how to handle the problem that the last column of a row in the source file is lost during data import.	true/on or false/off. The default value is false or off. If this parameter is set to true or on and the last column of a data row in a source data file is lost, the column will be replaced with NULL and no error message will be generated. If this parameter is set to false or off and the last column of a data row in a source data file is lost, the following error information will be displayed: missing data for column "tt"
ignore_extra_data	Specifies whether to ignore excessive columns when the number of columns in a source data file exceeds that defined in the foreign table. This parameter is available only for data import.	true/on or false/off. The default value is false or off. If this parameter is set to true or on and the number of source data files exceeds the number of foreign table columns, excessive columns will be ignored. If this parameter is set to false or off and the number of source data files exceeds the number of foreign table columns, the following error information will be displayed: extra data after last expected column CAUTION: If the linefeed at the end of a row is lost and this parameter is set to true, data in the next row will be ignored.
reject_limit	Specifies the maximum number of data format errors allowed during the data import. If the number of data format errors does not reach the maximum, the data import is successful.	The value is an integer or unlimited. If this parameter is not specified, an error message is returned immediately. CAUTION: You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'. Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data format error occurs, the whole data import process is stopped.
force_save_err	Indicates whether to save the error information to the error table after the import exits due to an error.	true/on or false/off. The default value is false or off. This parameter is used together with reject_limit. Once this parameter is enabled: If reject_limit is not specified, an error record will be retained in the error table. If reject_limit is set to N, N+1 error records will be retained in the error table.
obs_null_file	Imports and exports empty files between GaussDB(DWS) and OBS. CAUTION: This parameter is supported only in 8.2.1 or later. If obs_null_file is set to true or on and the export directory contains only the _SUCCESS empty file, the empty table can be exported repeatedly, while if obs_null_file is set to false or off, the empty table cannot be exported repeatedly. If obs_null_file is set to true or on and files are imported from multiple buckets, an error is reported for the first path that the file does not exist.	true/on or false/off. The default value is false or off. If obs_null_file is set to true or on: When an empty table is exported from GaussDB(DWS), an empty file named _SUCCESS is generated, indicating that the export is successful. When a non-empty table is exported, the original table and an empty file named _SUCCESS is generated. When a file is imported to GaussDB(DWS), if the file does not exist or the path is incorrect, the following error information is displayed: No such file or directory: 'XXX'
compatible_illegal_chars	Specifies whether to enable fault tolerance on invalid characters during data import. This syntax is available only for READ ONLY foreign tables.	true/on or false/off. The default value is false or off. If this parameter is set to true or on, invalid characters are tolerated and imported to the database after conversion. If this parameter is set to false or off and an error occurs when there are invalid characters, the import will be interrupted. The rule of error tolerance when you import invalid characters is as follows: \0 is converted to a space. Other invalid characters are converted to question marks. Setting compatible_illegal_chars to true or on enables toleration of invalid characters. If NULL, DELIMITER, QUOTE, and ESCAPE are set to spaces or question marks, errors like "illegal chars conversion may confuse COPY escape 0x20" will be displayed to prompt the user to modify parameters that may cause confusion, preventing importing errors. CAUTION: On a Windows platform, if OBS reads data files using the TEXT format, 0x1A will be treated as an EOF symbol and a parsing error will occur. It is the implementation constraint of the Windows platform. Since OBS on a Windows platform does not support BINARY read, the data can be read by OBS on a Linux platform.

**Table 5** OPTIONS performance parameters
Parameter	Description	Value Range
file_split_threshold	This parameter is used to optimize the performance of importing data in TEXT format. It specifies the lower limit of the logical block size of a file. If this parameter is specified, large files are split based on the actual file and DN status to improve the import concurrency. The purpose is to evenly distribute tasks on each DN. Therefore, this parameter can be used in scenarios where the number of files is less than the number of DNs or the file size is unbalanced.	0 to 2147483647, in MB The default value is 0, indicating that files are not split. This parameter is supported only in 8.2.0 or later. This parameter supports only READ ONLY foreign tables in TEXT format. This parameter specifies the lower limit of the logical block size of a file. It does not specify a block size. For example, if the current file size is 1,024 MB and the number of DNs is 4, If the value of file_split_threshold is less than 256, the file is evenly divided into four blocks, and a 256 MB file import task is allocated to each DN. When file_split_threshold is set to 500, the file is split into 500 MB and 524 MB and allocated to two DNs because the block size cannot be less than 500 MB. This parameter is also applicable to multiple files. Unless there are clear requirements for block sizes, you are advised to set this parameter to a small value, for example, 10. Otherwise, the concurrency may be affected.

Location Parameter Description

The URL of a read-only foreign table (the default permission is read-only) can end with the path prefix or the full path of the target object in the format of obs://Bucket/Prefix. Prefix indicates the prefix of an object path, for example, obs://mybucket/tpch/nation/.
If the region parameter is explicitly specified in obs://Bucket/Prefix, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read.
The URL of a writable foreign table does not need to contain a file name. You can specify only one data source location for a foreign table. The directory corresponding to the location must be created before you specify the location.
URLs specified for a read-only foreign table must be different.
Specify location when inserting data to a foreign table.
Parameter LOCATION supports prefixes gsobs and obs, which are identified as OBS information. LOCATION should be followed by gsobs, OBS URL, and Bucket, or by obs and Bucket.

When importing and exporting data, you are advised to use the location parameter as follows:

You are advised to specify a file name for location during data import. If you only specify an OBS bucket or directory, all text files in it will be imported. An error message will be reported if the data format is incorrect. If you set fault tolerance, a large amount of data may be imported to the fault-tolerant table.
Multiple files in an OBS bucket can be imported at the same time. The matched files are imported based on the file name prefix.
For example, you can identify and import the following two files after specifying the prefix mybucket/input_data/product_info in location:
```
mybucket/input_data/product_info.0
mybucket/input_data/product_info.1
```
If you specify a file name, for example, 1.csv, then other files (like 1.csv1 or 1.csv22) starting with 1.csv in the bucket or directory where 1.csv resides will be automatically imported. That is, 1.csv1 and 1.csv22 are automatically imported.
To specify multiple URLs in OBS mode, separate URLs by using vertical bars (|). In gsobs mode, only one URL can be specified.
During data export, a directory is generated for location by default. If you specify only a file name, the system automatically creates a directory whose name starts with the file name and then generates the file that stores the exported data. The file name is automatically generated by DWS.
You can specify one path for location only during data export.

Examples

Create a foreign table named OBS_ft to import data in the .txt format from OBS to the row_tbl table.

// Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.

     DROP FOREIGN TABLE IF EXISTS OBS_ft;

CREATE FOREIGN TABLE OBS_ft
( a int, b int)
SERVER gsmpp_server 
OPTIONS 
(location 'obs://gaussdbcheck/obs_ddl/test_case_data/txt_obs_informatonal_test001',
format 'text',
encoding 'utf8',
chunksize '32', 
encrypt 'on',
ACCESS_KEY 'access_key_value_to_be_replaced',
SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',
delimiter E'\x08') 
read only;

DROP TABLE row_tbl;

CREATE TABLE row_tbl( a int, b int);

INSERT INTO row_tbl select * from OBS_ft;
 
 
  

Helpful Links

ALTER FOREIGN TABLE (for HDFS or OBS), DROP FOREIGN TABLE

Optimization

delimiter
- A delimiter cannot be \r or \n, or the same as the null value. The delimiter of CSV cannot be same as the quote value.
- The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
- You are advised to use a multi-character string, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.
quote
- The value must be a single-byte character. The quote value cannot be the same as the delimiter or null value.
- Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.
mode Normal
- Supports all file types (including CSV, TEXT, and FIXED). To import data, you need to enable GDS on the data server.
mode Shared
- Supports the TEXT format. It does not require GDS, but all the user data has to be mounted to the same path of all the nodes through NFS.
mode Private
- Used in scenarios where user data has been stored under the same path as the local directory of DNs.

Parent Topic: DDL Syntax

Previous topic: CREATE FOREIGN TABLE (SQL on OBS or Hadoop)

Next topic: CREATE FUNCTION

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot