CREATE FOREIGN TABLE (SQL on OBS or Hadoop)
Function
CREATE FOREIGN TABLE creates an HDFS or OBS foreign table in the current database to access structured data stored on HDFS or OBS. You can also export data in ORC or PARQUET format to HDFS or OBS.
Data stored in OBS: Data storage is decoupled from compute. The cluster storage cost is low, and storage capacity is not limited. Clusters can be deleted at any time. However, the computing performance depends on the OBS access performance and is lower than that of HDFS. OBS is recommended for applications that do not demand a lot of computation.
Data stored in HDFS: Data storage is not decoupled from compute. The cluster storage cost is high, and storage capacity is limited. The computing performance is high. You must export data before you delete clusters. HDFS is recommended for computing-intensive scenarios.

- The single-node cluster (standalone) does not support an HDFS foreign table.
- The single-node cluster (standalone) 8.2.0.100 and later versions support OBS foreign tables, but you need to set the foreign data wrapper of the server to DFS_FDW.
- The following read and write foreign tables are supported only in the storage-compute decoupling 3.0 version.
Precautions
- The HDFS foreign table and OBS foreign table are classified into read-only, write-only, and read-write foreign tables. Read-only foreign tables are used for query. Write-only foreign tables can be used to export data from GaussDB(DWS) to the distributed file system. Read-write foreign tables can be used for query and data export. Only 9.1.0.100 and later versions support read-write foreign tables.
- This mode allows you to import and query data in a variety of formats including ORC, TEXT, CSV, CARBONDATA, PARQUET, and JSON. If you are dealing with OBS foreign tables, you can export them in the ORC format or the PARQUET format (only supported in the storage-compute decoupling 3.0 version). HDFS foreign tables can be exported in ORC or PARQUET format.
- In this mode, you need to manually create a foreign server. For details, see CREATE SERVER.
- If the foreign data wrapper is set to HDFS_FDW or DFS_FDW when you manually create a server, you need to specify the distribution mode in the DISTRIBUTE BY clause when creating a read-only foreign table.
Data Type |
DFS_FDW/HDFS_FDW |
||
---|---|---|---|
- |
READ ONLY |
WRITE ONLY |
READ WRITE |
ORC |
√ |
√ |
√ |
PARQUET |
√ |
√ |
√ |
CARBONDATA |
√ |
× |
× |
TEXT |
√ |
× |
× |
CSV |
√ |
× |
× |
JSON |
√ |
× |
× |
Syntax
Create an HDFS foreign table.
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ { column_name type_name [ { [CONSTRAINT constraint_name] NULL | [CONSTRAINT constraint_name] NOT NULL | column_constraint [...]} ] | table_constraint [, ...]} [, ...] ] ) SERVER server_name OPTIONS ( { option_name ' value ' } [, ...] ) [ {WRITE ONLY | READ ONLY | READ WRITE} ] DISTRIBUTE BY {ROUNDROBIN | REPLICATION} [ PARTITION BY ( column_name ) [ AUTOMAPPED ] ] ; |
- column_constraint is as follows:
1 2 3
[CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
- table_constraint is as follows:
1 2 3
[CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} (column_name) [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
Parameter |
Description |
Value Range |
---|---|---|
IF NOT EXISTS |
Does not throw an error if a table with the same name exists. A notice is issued in this case. |
- |
table_name |
Specifies the name of the foreign table. |
A string, which must comply with the naming convention. For details, see Identifier Naming Conventions. |
column_name |
Specifies the name of a column in the foreign table. Columns are separated by commas (,). |
A string, which must comply with the naming convention. For details, see Identifier Naming Conventions.
NOTICE:
A JSON object consists of nested or parallel name-value pairs, which are irrelevant to the sequence. When data in JSON format is imported, the mapping between fields and values is determined based on the automatic mapping between field names and names of name-value pairs. You need to define proper field names. Otherwise, you may not get the expected result. The rules for automatic mapping between field names and names of name-value pairs are as follows:
For example, to import each element of the {"A": "simple", "B": {"C": "nesting"}, "D": ["array", 2, {"E": "complicated"}]} object, the field names in the foreign table must be defined as a, b, b_c, d, d#0, d#1, d#2 and d#2_e. The sequence in which the fields are defined does not affect the import result. |
type_name |
Specifies the data type of the column. |
- |
constraint_name |
Specifies the name of a constraint for the foreign table. |
- |
{ NULL | NOT NULL } |
Specifies whether the column allows NULL. When you create a table, whether the data in HDFS is NULL or NOT NULL cannot be guaranteed. The consistency of data is guaranteed by users. Users must decide whether the column is NULL or NOT NULL. (The optimizer optimizes the NULL/NOT NULL and generates a better plan.) |
- |
SERVER server_name |
Specifies the server name of the foreign table. Users can customize its name. |
A string indicating an existing server. It must comply with the naming convention. |
OPTIONS |
Specifies parameters of foreign table data. |
The OPTIONS parameters to be specified vary depending on the scenario. The options are as follows:
|
WRITE ONLY | READ ONLY | READ WRITE |
Specifies the type of the foreign table to be created. |
If the foreign table type is not specified, a read-only foreign table is created by default. |
DISTRIBUTE BY ROUNDROBIN |
Specifies ROUNDROBIN as the distribution mode for the HDFS/OBS foreign table. |
- |
DISTRIBUTE BY REPLICATION |
Specifies REPLICATION as the distribution mode for the HDFS foreign table. |
- |
PARTITION BY ( column_name ) AUTOMAPPED |
column_name specifies the partition column. AUTOMAPPED means the partition column specified by the HDFS partitioned foreign table is automatically mapped with the partition directory information in HDFS. The prerequisite is that the sequences of partition columns specified in the HDFS foreign table and in the directory are the same. This function is applicable only to read-only foreign tables. |
|
CONSTRAINT constraint_name |
Specifies the name of informational constraint of the foreign table. |
A string, which must comply with the naming convention. |
PRIMARY KEY |
The primary key constraint specifies that one or more columns of a table must contain unique (non-duplicate) and non-null values. Only one primary key can be specified for a table. |
- |
UNIQUE |
Specifies that a group of one or more columns of a table must contain unique values. For the purpose of a unique constraint, NULL is not considered equal. |
- |
NOT ENFORCED |
Specifies the constraint to be an informational constraint. This constraint is guaranteed by the user instead of the database. |
- |
ENFORCED |
The default value is ENFORCED. ENFORCED is a reserved parameter and is currently not supported. |
- |
PRIMARY KEY (column_name) |
Specifies the informational constraint on column_name. |
A string indicating an existing column_name. It must comply with the naming convention. |
ENABLE QUERY OPTIMIZATION |
Optimizes an execution plan using an informational constraint. |
- |
DISABLE QUERY OPTIMIZATION |
Disables the optimization of an execution plan using an informational constraint. |
- |
Parameter |
Description |
Value Range |
---|---|---|
header |
Specifies whether a data file contains a table header. header is available only for CSV files. If header is on, the first row of the data file will be identified as the header and ignored during export. If header is off, the first row is identified as data. |
true/on or false/off. The default value is false or off. |
quote |
Specifies the quotation mark for the CSV format. The default value is a double quotation mark ("). |
The quote value cannot be the same as the delimiter or null value. The quote value must be a single-byte character. Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b. |
escape |
Specifies an escape character for a CSV file. The value must be a single-byte character. The default value is a double quotation mark ("). If it is the same as the value of quote, it will be replaced with \0. |
- |
location |
Specifies the file path on OBS. This is an OBS foreign table parameter. The data sources of multiple buckets are separated by vertical bars (|), for example, LOCATION 'obs://bucket1/folder/ | obs://bucket2/'. The database scans all objects in the specified folders. When accessing a DLI multi-version table, you do not need to specify the location parameter. |
- |
format |
Specifies the format of the source data file in a foreign table. |
The HDFS foreign table can read data in ORC, TEXT, JSON, CSV, and Parquet file formats when in READ ONLY mode. But if you are using the WRITE ONLY/READ WRITE foreign table, you can only work with data in ORC and PARQUET file formats. OBS foreign tables allow you to read data in ORC, TEXT, JSON, CSV, CarbonData, Parquet, and Hudi file formats in read-only mode. But if you are using write-only mode, you can only work with data in ORC and Parquet file formats.
NOTICE:
|
foldername |
The directory of the data source file in the foreign table, that is, the corresponding file directory in HDFS or on OBS. This is mandatory for WRITE ONLY and READ WRITE writable foreign tables and optional for READ ONLY foreign tables. When accessing a DLI multi-version table, you do not need to specify the foldername parameter. |
- |
encoding |
It specifies the encoding format of a source data file in the foreign table. Its default value is utf8. This parameter is optional. |
- |
totalrows |
(Optional) Estimated number of rows in a table. This parameter is used only for OBS foreign tables. Because OBS may store many files, it is slow to analyze data. This parameter allows you to set an estimated value so that the optimizer can estimate the table size according to the value. Generally, query efficiency is high when the estimated value is close to the actual value. |
- |
filenames |
Data source files specified in the foreign table. Multiple files are separated by commas (,). |
|
delimiter |
Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab. |
The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.
|
eol |
Specifies the newline character style of the imported data file. |
Multi-character newline characters are supported, but the newline character cannot exceed 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
|
null |
Specifies the string that represents a null value. |
The default value is \N for the TEXT format.
|
noescaping |
Specifies in TEXT format, whether to escape the backslash (\) and its following characters. |
noescaping is available only for the TEXT format. true/on or false/off. The default value is false or off. |
date_format |
Specifies the DATE format for data import. This parameter is valid only for READ ONLY foreign tables. |
The value must be in a valid DATE format. For details, see Date and Time Processing Functions and Operators.
|
time_format |
Specifies the TIME format for data import. This parameter is valid only for READ ONLY foreign tables. |
This parameter is available only for TEXT and CSV source data files. Value range: Valid TIME. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators. |
timestamp_format |
Specifies the TIMESTAMP format for data import. This parameter is valid only for READ ONLY foreign tables. |
This parameter is available only for TEXT and CSV source data files. Value range: any valid TIMESTAMP value. Time zones are not supported. For details, see Date and Time Processing Functions and Operators. |
smalldatetime_format |
Specifies the SMALLDATETIME format for data import. This parameter is valid only for READ ONLY foreign tables. |
This parameter is available only for TEXT and CSV source data files. Value range: any valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators. |
dataencoding |
This parameter specifies the data code of the data table to be exported when the database code is different from the data code of the data table. For example, the database code is Latin-1, but the data in the exported data table is in UTF-8 format. This parameter is optional. If this parameter is not specified, the database encoding format is used by default. This syntax is valid only for the WRITE ONLY/READ WRITE foreign table of HDFS. |
The parameter is available only for HDFS foreign tables in ORC format with WRITE ONLY or READ WRITE. Value range: data code types supported by the database encoding |
filesize |
Specifies the file size of a write-only foreign table. This parameter is optional. If this parameter is not specified, the file size in the distributed file system configuration is used by default. This syntax is valid only for WRITE ONLY/READ WRITE writable foreign tables. |
The parameter is available only for HDFS foreign tables in ORC format with WRITE ONLY or READ WRITE. Value range: an integer ranging from 1 to 1024 |
compression |
Specifies the file compression mode. This parameter is optional and is supported only by cluster versions 8.2.0 and later. For a write only/read-write foreign table, it specifies the compression mode for ORC files. For a READ ONLY foreign table, it specifies the compression mode for TEXT, CSV, or JSON files. |
NOTICE:
|
version |
Specifies the ORC version number. This parameter is optional. This syntax is available only for WRITE ONLY/READ WRITE foreign tables. |
Value range: Only 0.12 is supported. The default value is 0.12. |
julian_adjust |
Specifies whether to correct the Julian day in the data. This parameter is optional and is supported only by cluster versions 8.3.0 and later. Spark 2.X uses the Julian day, and Spark 3.0 and later versions use the Proleptic Gregorian calendar. As a result, some dates before parquet 1582 are different. Correct the dates based on the site requirements. |
The value can be true or false. Default value: true |
cache_policy |
Specifies the disk cache policy for foreign tables. This parameter is supported only in the storage-compute decoupling 3.0 version. |
Value range: For foreign tables, this parameter can be set to ALL or NONE. ALL indicates that the hot cache in the disk cache is used, and NONE indicates that the cold cache is used. Hot cache occupies more space than cold cache and uses more complex replacement algorithms. |
Parameter |
Description |
Value Range |
---|---|---|
fill_missing_fields |
Specifies whether to generate an error message when the last column in a row in the source file is lost during data loading. |
true/on or false/off. The default value is false or off.
NOTICE:
|
ignore_extra_data |
Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is available only during data importing. |
true/on or false/off. The default value is false or off.
NOTICE:
|
checkencoding |
Specifies whether to check the character encoding. |
Value range: no, low, and high The default value is low. In TEXT format, the rule of error tolerance for invalid characters imported is as follows:
In ORC format, the rule of error tolerance for invalid characters imported is as follows:
|
force_mapping |
Indicates the handling method used when no correct name-value pairs are matched for the foreign table columns in JSON format. |
The options include true and false. Default value: true
NOTICE:
There are no restrictions on JSON objects. While the definition of foreign table fields must comply with GaussDB(DWS) identifier specifications (such as length and character restrictions). Therefore, this import method may cause exceptions. For example, a JSON name cannot be correctly identified or a field is repeatedly defined. You are advised to use the fault tolerance option force_mapping or JSON operators (for details, see JSON/JSONB Functions and Operators). For JSON format, SELECT COUNT(*) does not parse specific fields. Therefore, no error is reported when a field is missing or the format is incorrect. |
Parameter |
Description |
Value Range |
---|---|---|
dli_project_id |
Specifies the project ID corresponding to DLI. You can obtain the project ID from the management console. This parameter is available only when the server type is DLI. This parameter is supported only in 8.1.1 or later. |
- |
dli_database_name |
Specifies the name of the database where the DLI multi-version table to be accessed is located. This parameter is available only when the server type is DLI. This parameter is supported only in 8.1.1 or later. |
- |
dli_table_name |
Specifies the name of the DLI multi-version table to be accessed. This parameter is available only when the server type is DLI. This parameter is supported only in 8.1.1 or later. |
- |
Parameter |
Description |
Value Range |
---|---|---|
auth_server |
Specifies a server, which is used for user identity authentication when KMS is used to encrypt data in the Hudi foreign table. This parameter can be specified only when format is set to Hudi. This parameter is supported only by clusters of version 8.3.0 or later. |
Value range: name of an available HDFS server. MRS provides the column-level encryption capability for Hudi data. RangerKMS is used to manage users and key IDs and control encryption and decryption permissions. You can specify the auth_server parameter when creating a foreign table to bind user RangerKMS to the table. In this way, you can be authorized and obtain keys to read encrypted data. |
kms_url |
Specifies the IP address and port number of the KMS service. This parameter can be specified only when format is set to hudi and auth_server is specified. This parameter is supported only by clusters of version 8.3.0 or later. |
Value range: a valid string in ip:port format. Generally, if auth_server is specified, the IP address and port number of the KMS service can be automatically parsed. If this parameter is specified, the address is forcibly used to access the KMS service. |
cow_improve |
Optimization parameter for the COPY_ON_WRITE table. When this parameter is enabled, DNs replace the dws-bigdata component to directly read data from OBS. This parameter can be specified only when format is set to Hudi and cannot be specified together with auth_server. This parameter is supported only by clusters of version 8.3.0 or later. |
Value range: true, on, false, and off. The default value is false or off.
NOTICE:
This parameter is invalid for MERGE_ON_READ tables. |
Parameter |
OBS |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
- |
TEXT |
CSV |
JSON |
ORC |
CARBONDATA |
PARQUET |
HUDI |
||||
READ ONLY |
READ ONLY |
READ ONLY |
READ ONLY |
WRITE ONLY |
READ WRITE |
READ ONLY |
READ ONLY |
WRITE ONLY |
READ WRITE |
READ ONLY |
|
location |
√ |
√ |
√ |
√ |
× |
× |
√ |
√ |
× |
× |
× |
format |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
header |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
delimiter |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
quote |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
escape |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
null |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
noescaping |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
encoding |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
fill_missing_fields |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
ignore_extra_data |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
× |
date_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
time_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
timestamp_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
smalldatetime_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
chunksize |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
filenames |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
foldername |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
dataencoding |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
filesize |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
compression |
√ |
√ |
√ |
× |
√ |
√ |
× |
× |
√ |
√ |
× |
version |
× |
× |
× |
× |
√ |
√ |
× |
× |
× |
× |
× |
checkencoding |
√ |
√ |
√ |
√ |
× |
√ |
√ |
√ |
√ |
√ |
√ |
totalrows |
√ |
√ |
√ |
√ |
× |
√ |
× |
× |
× |
× |
× |
force_mapping |
× |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
auth_server |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
√ |
kms_url |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
√ |
cow_improve |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
√ |
julian_adjust |
× |
× |
× |
× |
× |
× |
× |
√ |
× |
× |
√ |
Parameter |
HDFS |
||||||||
---|---|---|---|---|---|---|---|---|---|
- |
TEXT |
CSV |
JSON |
ORC |
PARQUET |
||||
READ ONLY |
READ ONLY |
READ ONLY |
READ ONLY |
WRITE ONLY |
READ WRITE |
READ ONLY |
WRITE ONLY |
READ WRITE |
|
location |
× |
× |
× |
× |
× |
× |
× |
× |
× |
format |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
header |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
delimiter |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
quote |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
escape |
× |
√ |
× |
× |
× |
× |
× |
× |
× |
null |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
noescaping |
√ |
× |
× |
× |
× |
× |
× |
× |
× |
encoding |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
fill_missing_fields |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
ignore_extra_data |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
date_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
time_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
timestamp_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
smalldatetime_format |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
chunksize |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
filenames |
√ |
√ |
√ |
√ |
× |
× |
√ |
× |
× |
foldername |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
dataencoding |
× |
× |
× |
× |
√ |
√ |
× |
× |
× |
filesize |
× |
× |
× |
× |
√ |
√ |
× |
× |
× |
compression |
√ |
√ |
√ |
× |
√ |
√ |
× |
√ |
√ |
version |
× |
× |
× |
× |
√ |
√ |
× |
× |
× |
checkencoding |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
totalrows |
× |
× |
× |
× |
× |
× |
× |
× |
× |
force_mapping |
× |
× |
√ |
× |
× |
× |
× |
× |
× |
julian_adjust |
× |
× |
× |
× |
× |
× |
√ |
× |
× |

chunksize indicates the cache size of each OBS reading thread on a DN. The format can be text, CSV, or JSON. The default size is 4 MB.
Informational Constraints
In GaussDB(DWS), the use of data constraints depend on users. If users can make data sources strictly comply with certain constraints, the query on data with such constraints can be accelerated. Foreign tables do not support Index. Informational constraint is used for optimizing query plans.
The constraints of creating informational constraints for an HDFS foreign table are as follows:
- You can create an informational constraint only if the values in a NOT NULL column in your table are unique. Otherwise, the query result will be different from expected.
- Currently, the informational constraint of GaussDB(DWS) supports only PRIMARY KEY and UNIQUE constraints.
- The informational constraints of GaussDB(DWS) support the NOT ENFORCED attribute.
- UNIQUE informational constraints can be created for multiple columns in a table, but only one PRIMARY KEY constraint can be created in a table.
- Multiple informational constraints can be established in a column of a table (because the function that establishing a column or multiple constraints in a column is the same.) Therefore, you are not advised to set up multiple informational constraints in a column, and only one Primary Key type can be set up.
- Table-level COMMENT is not supported.
- Multi-column combination constraints are not supported.
- Different CNs in the same cluster cannot concurrently export data to the same write-only ORC foreign table.
- The catalog of a write-only foreign table in ORC format can only be used as the export catalog of a single foreign table of GaussDB(DWS). It cannot be used for multiple foreign tables, and other components cannot write other files to this catalog.
Example 1
In HDFS, import the TPC-H benchmark test tables part and region using Hive. The path of the part table is /user/hive/warehouse/partition.db/part_4, and that of the region table is /user/hive/warehouse/mppdb.db/region_orc11_64stripe/.
- Establish HDFS_Server, with HDFS_FDW or DFS_FDW as the foreign data wrapper.
1
CREATE SERVER hdfs_server FOREIGN DATA WRAPPER HDFS_FDW OPTIONS (address '10.10.0.100:25000,10.10.0.101:25000',hdfscfgpath '/opt/hadoop_client/HDFS/hadoop/etc/hadoop',type'HDFS');
- The IP addresses and port numbers of HDFS NameNodes are specified in OPTIONS. For details about the port number, search for dfs.namenode.rpc.port in the MRS-HDFS service configuration. In this example the port number is 25000.
- 10.10.0.100:25000,10.10.0.101:25000 indicates the IP addresses and port numbers of the primary and standby HDFS NameNodes. It is the recommended format. Two groups of parameters are separated by commas (,).
- Create an HDFS foreign table. The HDFS server associated with the table is hdfs_server, the corresponding file format of the ft_region table on the HDFS server is 'orc', and the file directory in the HDFS file system is '/user/hive/warehouse/mppdb. db/region_orc11_64stripe/'.
- Create an HDFS foreign table without partition keys.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CREATE FOREIGN TABLE ft_region ( R_REGIONKEY INT4, R_NAME TEXT, R_COMMENT TEXT ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/mppdb.db/region_orc11_64stripe/' ) DISTRIBUTE BY roundrobin;
- Create an HDFS foreign table with partition keys.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
CREATE FOREIGN TABLE ft_part ( p_partkey int, p_name text, p_mfgr text, p_brand text, p_type text, p_size int, p_container text, p_retailprice float8, p_comment text ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/partition.db/part_4' ) DISTRIBUTE BY roundrobin PARTITION BY (p_mfgr) AUTOMAPPED;
GaussDB(DWS) allows you to specify files using the keyword filenames or foldername. The latter is recommended. The key word distribute specifies the storage distribution mode of the ft_region table.
- Create an HDFS foreign table without partition keys.
- View the created server and foreign table.
1 2 3 4 5 6 7 8 9 10 11
SELECT * FROM pg_foreign_table WHERE ftrelid='ft_region'::regclass; ftrelid | ftserver | ftwriteonly | ftoptions ---------+----------+-------------+------------------------------------------------------------------------------ 16510 | 16509 | f | {format=orc,foldername=/user/hive/warehouse/mppdb.db/region_orc11_64stripe/} (1 row) select * from pg_foreign_table where ftrelid='ft_part'::regclass; ftrelid | ftserver | ftwriteonly | ftoptions ---------+----------+-------------+------------------------------------------------------------------ 16513 | 16509 | f | {format=orc,foldername=/user/hive/warehouse/partition.db/part_4} (1 row)
Example 2
Export data from the TPC-H benchmark test table region to the /user/hive/warehouse/mppdb.db/region_orc/ directory of the HDFS file system through the HDFS write-only foreign table.
- Create an HDFS foreign table. The corresponding foreign data wrapper is HDFS_FDW or DFS_FDW, which is the same as that in Example 1.
- Create a write-only HDFS foreign table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
CREATE FOREIGN TABLE ft_wo_region ( R_REGIONKEY INT4, R_NAME TEXT, R_COMMENT TEXT ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/mppdb.db/region_orc/' ) WRITE ONLY;
- Writes data to the HDFS file system through a write-only foreign table.
1
INSERT INTO ft_wo_region SELECT * FROM region;
Example 3 (Only for the Storage-Compute Decoupling 3.0 Version)
Export data from the TPC-H benchmark test table region table to the /user/hive/warehouse/mppdb.db/region_orc/ directory of the HDFS file system through the write-only, multi-level partitioned HDFS foreign table.
- Create an HDFS foreign table. The corresponding foreign data wrapper is HDFS_FDW or DFS_FDW, which is the same as that in Example 1.
- Create a write-only foreign table for HDFS multi-level partitions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CREATE FOREIGN TABLE ft_wo_region_partition ( R_REGIONKEY INT4, R_NAME TEXT, R_COMMENT TEXT ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/mppdb.db/region_orc/' ) WRITE ONLY PARTITION BY (R_REGIONKEY, R_NAME);
- Writes data to the HDFS file system through a write-only foreign table.
1
INSERT INTO ft_wo_region_partition SELECT * FROM region;
Example 4 (Only for the Storage-Compute Decoupling 3.0 Version)
Read data from the OBS file system through the foreign table partsupp.
- Create an OBS server.
- Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
- For details about the syntax for creating a server, see CREATE SERVER.
1 2 3 4 5 6
CREATE SERVER obs_srv FOREIGN DATA WRAPPER dfs_fdw OPTIONS ( address 'obs.ap-southeast-1.myhuaweicloud.com', type 'obs', access_key 'xxx', secret_access_key 'xxx' );
- Create an OBS foreign table partsupp and use the cold cache in the disk cache.
1 2 3 4 5 6 7 8 9 10 11 12 13
CREATE FOREIGN TABLE PARTSUPP ( PS_PARTKEY BIGINT NOT NULL, PS_SUPPKEY BIGINT NOT NULL, PS_AVAILQTY BIGINT NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL ) SERVER obs_srv options ( encoding 'utf-8', format 'parquet', foldername 'xxx', cache_policy 'NONE' ) READ ONLY DISTRIBUTE BY ROUNDROBIN;
- Read data from OBS through the foreign table partsupp.
1
SELECT * FROM PARTSUPP;
Example 5
Perform operations on an HDFS foreign table that includes informational constraints.
- Create an HDFS foreign table with informational constraints.
1 2 3 4 5 6 7 8 9 10
CREATE FOREIGN TABLE ft_region ( R_REGIONKEY int, R_NAME TEXT, R_COMMENT TEXT , primary key (R_REGIONKEY) not enforced) SERVER hdfs_server OPTIONS(format 'orc', encoding 'utf8', foldername '/user/hive/warehouse/mppdb.db/region_orc11_64stripe') DISTRIBUTE BY roundrobin;
- Check whether the region table has an informational constraint index:
1 2 3 4 5 6 7 8 9 10 11
SELECT relname,relhasindex FROM pg_class WHERE oid='ft_region'::regclass; relname | relhasindex ------------------------+------------- ft_region | f (1 row) SELECT conname, contype, consoft, conopt, conindid, conkey FROM pg_constraint WHERE conname ='ft_region_pkey'; conname | contype | consoft | conopt | conindid | conkey ----------------+---------+---------+--------+----------+-------- ft_region_pkey | p | t | t | 0 | {1} (1 row)
- Delete the informational constraint:
1 2 3 4 5 6
ALTER FOREIGN TABLE ft_region DROP CONSTRAINT ft_region_pkey RESTRICT; SELECT conname, contype, consoft, conindid, conkey FROM pg_constraint WHERE conname ='ft_region_pkey'; conname | contype | consoft | conindid | conkey ---------+---------+---------+----------+-------- (0 rows)
- Add a unique informational constraint for the foreign table:
1
ALTER FOREIGN TABLE ft_region ADD CONSTRAINT constr_unique UNIQUE(R_REGIONKEY) NOT ENFORCED;
Delete the unique informational constraint:1 2 3 4 5 6
ALTER FOREIGN TABLE ft_region DROP CONSTRAINT constr_unique RESTRICT; SELECT conname, contype, consoft, conindid, conkey FROM pg_constraint WHERE conname ='constr_unique'; conname | contype | consoft | conindid | conkey ---------+---------+---------+----------+-------- (0 rows)
- Add a unique informational constraint for the foreign table:
1 2 3 4 5 6 7
ALTER FOREIGN TABLE ft_region ADD CONSTRAINT constr_unique UNIQUE(R_REGIONKEY) NOT ENFORCED disable query optimization; SELECT relname,relhasindex FROM pg_class WHERE oid='ft_region'::regclass; relname | relhasindex ------------------------+------------- ft_region | f (1 row)
Delete the unique informational constraint:1
ALTER FOREIGN TABLE ft_region DROP CONSTRAINT constr_unique CASCADE;
Example 6
Read json data stored in OBS using a foreign table.
- The following JSON files are on OBS. The JSON objects contain nesting and arrays. Some objects have lost columns, and some object names are duplicate.
{"A" : "simple1", "B" : {"C" : "nesting1"}, "D" : ["array", 2, {"E" : "complicated"}]} {"A" : "simple2", "D" : ["array", 2, {"E" : "complicated"}]} {"A" : "simple3", "B" : {"C" : "nesting3"}, "D" : ["array", 2, {"E" : "complicated3"}]} {"B" : {"C" : "nesting4"},"A" : "simple4", "D" : ["array", 2, {"E" : "complicated4"}]} {"A" : "simple5", "B" : {"C" : "nesting5"}, "D" : ["array", 2, {"E" : "complicated5"}]}
- Create obs_server, with DFS_FDW as the foreign data wrapper.
1 2 3 4 5 6
CREATE SERVER obs_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( ADDRESS 'obs.ap-southeast-1.myhuaweicloud.com', ACCESS_KEY 'xxxxxxxxx', SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', TYPE 'OBS' );
- ADDRESS is the endpoint of OBS. Replace it with the actual endpoint. You can find the domain name by searching for the value of regionCode in the region_map file.
- ACCESS_KEY and SECRET_ACCESS_KEY are access keys for the cloud account system. Replace the values as needed.
- TYPE indicates the server type. Retain the value OBS.
- Create the OBS foreign table json_f and define the column names. For example, d#2_e indicates that the column is object e nested in the 2nd element of array d. The OBS server associated with the table is obs_server. foldername indicates the data source directory of the foreign table, that is, the OBS directory.
// Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
1 2 3 4 5 6 7 8 9 10 11
CREATE FOREIGN TABLE json_f ( a VARCHAR(10), b_c TEXT, d#1 INTEGER, d#2_e VARCHAR(30) )SERVER obs_server OPTIONS ( foldername '/xxx/xxx/', format 'json', encoding 'utf8', force_mapping 'true' )distribute by roundrobin;
- Query the foreign table json_f. The fault tolerance parameter force_mapping is enabled by default. If a column is missing in a JSON object, NULL is filled in. If a JSON object name is duplicate, the last name prevails.
1 2 3 4 5 6 7 8 9
SELECT * FROM json_f; a | b_c | d#1 | d#2_e ---------+----------+-----+-------------- simple1 | nesting1 | 2 | complicated1 simple2 | | 2 | complicated2 simple3 | nesting3 | 2 | complicated3 simple4 | nesting4 | 2 | complicated4 repeat | nesting5 | 2 | complicated5 (5 rows)
Example 7
Read a DLI multi-version foreign table using a foreign table. Only DLI 8.1.1 and later support the multi-version foreign table example.
- Create dli_server, with DFS_FDW as the foreign data wrapper.
1 2 3 4 5 6 7 8 9
CREATE SERVER dli_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( ADDRESS 'obs.ap-southeast-1.myhuaweicloud.com', ACCESS_KEY 'xxxxxxxxx', SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', TYPE 'DLI', DLI_ADDRESS 'dli.example.com', DLI_ACCESS_KEY 'xxxxxxxxx', DLI_SECRET_ACCESS_KEY 'yyyyyyyyyyyyy' );
- ADDRESS is the endpoint of OBS. DLI_ADDRESS is the endpoint of DLI. Replace it with the actual endpoint.
- ACCESS_KEY and SECRET_ACCESS_KEY are access keys for the cloud account system to access OBS. Use the actual value.
- DLI_ACCESS_KEY and DLI_SECRET_ACCESS_KEY are access keys for the cloud account system to access DLI. Use the actual value.
- TYPE indicates the server type. Retain the value DLI.
- Create the OBS foreign table customer_address for accessing DLI. The table does not contain partition columns, and the DLI server associated with the table is dli_server. Where, the project_id is xxxxxxxxxxxxxxx, the database_name on DLI is database123, and the table_name of the table to be accessed is table456. Replace them based on the actual requirements.
// Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
CREATE FOREIGN TABLE customer_address ( ca_address_sk integer not null, ca_address_id char(16) not null, ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(36,33) , ca_location_type char(20) ) SERVER dli_server OPTIONS ( FORMAT 'ORC', ENCODING 'utf8', DLI_PROJECT_ID 'xxxxxxxxxxxxxxx', DLI_DATABASE_NAME 'database123', DLI_TABLE_NAME 'table456' ) DISTRIBUTE BY roundrobin;
- Query data in a DLI multi-version table using a foreign table.
1 2 3 4 5
SELECT COUNT(*) FROM customer_address; count ------- 20 (1 row)
Helpful Links
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot