CREATE FOREIGN TABLE (for OBS Import and Export)
Function
CREATE FOREIGN TABLE creates a foreign table in the current database for parallel data import and export of OBS data. The server used is gsmpp_server, which is created by the database by default.

The single-node cluster (standalone) does 8.2.0.100 and later versions support OBS foreign table import and export.
Precautions
- Only the data in text and CSV formats is supported, and the OBS connection should be configured. ORC, CarbonData and PARQUET data in OBS is not applicable. For details, see CREATE FOREIGN TABLE (SQL on OBS or Hadoop).
- The foreign table can be READ ONLY or WRITE ONLY. The default value is READ ONLY. To import data to the cluster, use READ ONLY for the foreign table. To export data, use WRITE ONLY.
- The foreign table is owned by the user who runs the command.
- The distribution mode of an OBS foreign table does not need to be explicitly specified. The default mode is ROUNDROBIN.
- Only constraints in Informational Constraints take effect for the created foreign table.
- Ensure no Chinese characters are contained in paths used for importing data to or exporting data from OBS.
Data Type |
DIST_FDW |
|
---|---|---|
- |
READ ONLY |
WRITE ONLY |
ORC |
× |
× |
PARQUET |
× |
× |
CARBONDATA |
× |
× |
TEXT |
√ |
√ |
CSV |
√ |
√ |
JSON |
× |
× |
Syntax
1 2 3 4 5 6 7 8 |
CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( { column_name type_name [column_constraint ] | LIKE source_table | table_constraint [, ...]} [, ...] ) SERVER server_name OPTIONS ( { option_name ' value ' } [, ...] ) [ { WRITE ONLY | READ ONLY }] [ WITH error_table_name | LOG INTO error_table_name] [PER NODE REJECT LIMIT 'value'] ; |
- column_constraint is as follows:
1 2 3
[CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
- table_constraint is as follows:
1 2 3
[CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} (column_name) [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
Parameter Description
Parameter |
Description |
Value Range |
---|---|---|
IF NOT EXISTS |
Does not throw an error if a table with the same name exists. A notice is issued in this case. |
- |
table_name |
Specifies the name of the foreign table. |
A string, which must comply with the naming convention. For details, see Identifier Naming Conventions. |
column_name |
Specifies the name of a column in the foreign table. |
A string, which must comply with the naming convention. For details, see Identifier Naming Conventions. |
type_name |
Specifies the data type of the column. |
- |
SERVER server_name |
Specifies the server name of the foreign table. For OBS foreign tables used for data import and export, you can use gsmpp_server created by the initial database by default or use a customized server. |
|
OPTIONS |
Specifies parameters of foreign table data. |
|
READ ONLY |
Specifies whether a foreign table is read-only. This parameter is available only for data import. |
- |
WRITE ONLY |
Specifies whether a foreign table is write-only. This parameter is available only for data export. |
- |
WITH error_table_name |
Data format errors during import are recorded in the table specified by error_table_name. You can query this table after the import to obtain error details. This parameter is available only after reject_limit is set. |
Value range: a string. It must comply with the naming convention.
NOTICE:
To be compatible with PostgreSQL open source interfaces, you are advised to replace this syntax with LOG INTO. When this parameter is specified, an error table is automatically created. |
LOG INTO error_table_name |
Data format errors during import are recorded in the table specified by error_table_name. You can query this table after the import to obtain error details. |
Value range: a string. It must comply with the naming convention.
|
PER NODE REJECT LIMIT 'value' |
This parameter specifies the allowed number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import. |
Value range: an unlimited integer. If this parameter is not specified, an error message is returned immediately.
NOTICE:
This syntax specifies the error tolerance of a single node. Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data import process stops. |
NOT ENFORCED |
Specifies the constraint to be an informational constraint. This constraint is guaranteed by the user instead of the database. |
- |
ENFORCED |
The default value is ENFORCED. ENFORCED is a reserved parameter and is currently not supported. |
- |
PRIMARY KEY (column_name) |
Specifies the informational constraint on column_name. |
Value range: a string. It must comply with the naming convention, and the value of column_name must exist. |
ENABLE QUERY OPTIMIZATION |
Optimizes the query plan using an informational constraint. |
- |
DISABLE QUERY OPTIMIZATION |
Disables the optimization of the query plan using an informational constraint. |
- |
Parameter |
Description |
Value Range |
---|---|---|
encrypt |
Specifies whether HTTPS is enabled for data transfer. on enables HTTPS and off disables it (in this case, HTTP is used). The default value is off. |
- |
access_key |
Indicates the access key (AK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its AK value is not encrypted and saved to the metadata table of the database. The correctness of the parameter is not verified when a foreign table is created. |
- |
secret_access_key |
Indicates the secret access key (SK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its SK value is encrypted and saved to the metadata table of the database. The correctness of the parameter is not verified when a foreign table is created. |
- |
security_token |
Corresponds to the SecurityToken value of the temporary security credential in IAM. A temporary AK, a temporary SK, and a temporary security token form a temporary security credential. This parameter is supported by version 8.2.0 or later clusters. |
NOTICE:
|
chunksize |
Specifies the cache read by each OBS thread on a DN. Its value range is 8 to 512 in the unit of MB. Its default value is 64. |
- |
location |
Specifies the data source location of a foreign table. Currently, only URLs are allowed. Multiple URLs are separated using vertical bars (|). |
For details about how to use this parameter, see Location Parameter Description. |
region |
(Optional) specifies the value of regionCode, region information on the cloud. If the region parameter is explicitly specified, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read. |
NOTE:
Note the following when setting parameters for importing or exporting OBS foreign tables in TEXT or CSV format:
|
format |
Specifies the format of the source data file in a foreign table. |
Valid value: CSV and TEXT. The default value is TEXT. GaussDB(DWS) only supports CSV and TEXT formats.
|
header |
Specifies whether a file contains a header with the names of each column in the file. |
When OBS exports data, this parameter cannot be set to true. Use the default value false, indicating that the first row of the exported data file is not the header. When data is imported, if header is on, the first row of the data file will be identified as title row and ignored. If header is off, the first row will be identified as a data row. Valid value: true, on, false, and off. The default value is false or off. |
delimiter |
Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,). |
The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.
|
quote |
Specifies the quotation mark for the CSV format. The default value is a double quotation mark ("). |
|
escape |
Specifies an escape character for a CSV file. The value must be a single-byte character. |
The default value is a double quotation mark ("). If the value is the same as the quote value, it will be replaced with \0. |
null |
Specifies how to represent a null value. |
|
noescaping |
Specifies whether to escape the backslash (\) and its following characters in the TEXT format. |
noescaping is available only for the TEXT format. true/on or false/off. The default value is false or off. |
encoding |
Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database. |
Before you import foreign tables, it is recommended that you set client_encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before exporting foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expect. If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.
NOTE:
|
eol |
Specifies the newline character style of the imported or exported data file. |
Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
|
date_format |
Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables. |
Value range: a valid DATE value. For details, see Date and Time Processing Functions and Operators.
NOTE:
If ORACLE is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below. |
time_format |
Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables. |
Value range: a valid TIME value. Time zones cannot be used. |
timestamp_format |
Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables. |
Value range: any valid TIMESTAMP value. Time zones cannot be used. |
smalldatetime_format |
Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables. |
Value range: a valid SMALLDATETIME value. |
bom |
Indicates whether a CSV file contains the utf8 BOM. |
This parameter is valid only when the foreign table is read-only and uses UTF8 code. Value range: true, on, false, and off Default value: false |
Location Parameter Description
- The URL of a read-only foreign table (the default permission is read-only) can end with the path prefix or the full path of the target object in the format of obs://Bucket/Prefix. Prefix indicates the prefix of an object path, for example, obs://mybucket/tpch/nation/.
- If the region parameter is explicitly specified in obs://Bucket/Prefix, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read.
- The URL of a writable foreign table does not need to contain a file name. You can specify only one data source location for a foreign table. The directory corresponding to the location must be created before you specify the location.
- URLs specified for a read-only foreign table must be different.
- Specify location when inserting data to a foreign table.
- Parameter LOCATION supports prefixes gsobs and obs, which are identified as OBS information. LOCATION should be followed by gsobs, OBS URL, and Bucket, or by obs and Bucket.
When importing and exporting data, you are advised to use the location parameter as follows:
- You are advised to specify a file name for location during data import. If you only specify an OBS bucket or directory, all text files in it will be imported. An error message will be reported if the data format is incorrect. If you set fault tolerance, a large amount of data may be imported to the fault-tolerant table.
- Multiple files in an OBS bucket can be imported at the same time. The matched files are imported based on the file name prefix.
- If you specify a file name, for example, 1.csv, then other files (like 1.csv1 or 1.csv22) starting with 1.csv in the bucket or directory where 1.csv resides will be automatically imported. That is, 1.csv1 and 1.csv22 are automatically imported.
- To specify multiple URLs in OBS mode, separate URLs by using vertical bars (|). In gsobs mode, only one URL can be specified.
- During data export, a directory is generated for location by default. If you specify only a file name, the system automatically creates a directory whose name starts with the file name and then generates the file that stores the exported data. The file name is automatically generated by GaussDB(DWS).
- You can specify one path for location only during data export.
Examples

// Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
1 2 3 4 5 6 7 8 9 |
DROP FOREIGN TABLE IF EXISTS OBS_ft; CREATE FOREIGN TABLE OBS_ft( a int, b int)SERVER gsmpp_server OPTIONS (location 'obs://gaussdbcheck/obs_ddl/test_case_data/txt_obs_informatonal_test001',format 'text',encoding 'utf8',chunksize '32', encrypt 'on',ACCESS_KEY 'access_key_value_to_be_replaced',SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',delimiter E'\x08') read only; DROP TABLE row_tbl; CREATE TABLE row_tbl( a int, b int); INSERT INTO row_tbl select * from OBS_ft; |
Helpful Links
Optimization
- delimiter
- A delimiter cannot be \r or \n, or the same as the null value. The delimiter of CSV cannot be same as the quote value.
- The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
- You are advised to use a multi-character string, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.
- quote
- The value must be a single-byte character. The quote value cannot be the same as the delimiter or null value.
- Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.
- mode Normal
- Supports all file types (including CSV, TEXT, and FIXED). To import data, you need to enable GDS on the data server.
- mode Shared
- Supports the TEXT format. It does not require GDS, but all the user data has to be mounted to the same path of all the nodes through NFS.
- mode Private
- Used in scenarios where user data has been stored under the same path as the local directory of DNs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot