Updated on 2024-12-19 GMT+08:00

CREATE FOREIGN TABLE (SQL on OBS or Hadoop)

Function

CREATE FOREIGN TABLE creates an HDFS or OBS foreign table in the current database to access structured data stored on HDFS or OBS. You can also export data in ORC format to HDFS or OBS.

Data stored in OBS: Data storage is decoupled from compute. The cluster storage cost is low, and storage capacity is not limited. Clusters can be deleted at any time. However, the computing performance depends on the OBS access performance and is lower than that of HDFS. OBS is recommended for applications that do not demand a lot of computation.

Data stored in HDFS: Data storage is not decoupled from compute. The cluster storage cost is high, and storage capacity is limited. The computing performance is high. You must export data before you delete clusters. HDFS is recommended for computing-intensive scenarios.

The hybrid data warehouse (standalone) does not support OBS and HDFS foreign table import and export.

Precautions

  • HDFS foreign tables and OBS foreign tables are classified into read-only and write-only foreign tables. Read-only foreign tables are used for query, and write-only foreign tables can be used to export data from GaussDB(DWS) to a distributed file system.
  • You can import and query data in various formats such as ORC, TEXT, CSV, CARBONDATA, PARQUET, and JSON. If you are using OBS foreign tables, you can export them in ORC, CSV, and TEXT formats. However, if you are using HDFS foreign tables, you can only export them in ORC format.
  • In this mode, you need to manually create a foreign server. For details, see CREATE SERVER.
  • If the foreign data wrapper is set to HDFS_FDW or DFS_FDW when you manually create a server, you need to specify the distribution mode in the DISTRIBUTE BY clause when creating a read-only foreign table.

Syntax

Create a foreign table,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name 
( [ { column_name type_name 
    [ { [CONSTRAINT constraint_name] NULL |
    [CONSTRAINT constraint_name] NOT NULL |
      column_constraint [...]} ] |
      table_constraint [, ...]} [, ...] ] ) 
    SERVER server_name 
    OPTIONS ( { option_name ' value ' } [, ...] ) 
    [ {WRITE ONLY | READ ONLY}]
    DISTRIBUTE BY {ROUNDROBIN | REPLICATION}
   
    [ PARTITION BY ( column_name ) [ AUTOMAPPED ] ] ;
  • column_constraint is as follows:
    1
    2
    3
    [CONSTRAINT constraint_name]
    {PRIMARY KEY | UNIQUE}
    [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
    
  • table_constraint is as follows:
    1
    2
    3
    [CONSTRAINT constraint_name]
    {PRIMARY KEY | UNIQUE} (column_name)
    [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
    

Parameter Description

  • IF NOT EXISTS

    Does not throw an error if a table with the same name exists. A notice is issued in this case.

  • table_name

    Specifies the name of the foreign table to be created.

    Value range: a string. It must comply with the naming convention.

  • column_name

    Specifies the name of a column in the foreign table. Columns are separated by commas (,).

    Value range: a string. It must comply with the naming convention.

    A JSON object consists of nested or parallel name-value pairs, which are irrelevant to the sequence. When data in JSON format is imported, the mapping between fields and values is determined based on the automatic mapping between field names and names of name-value pairs. You need to define proper field names. Otherwise, you may not get the expected result. The rules for automatic mapping between field names and names of name-value pairs are as follows:

    • If there are no nesting or arrays, the field names must be the same as the names of name-value pairs, case insensitive.
    • Use underscores (_) to concatenate two names to identify the nesting relationship.
    • A field name uses the number sign (#) and a decimal non-negative integer (n) to identify the nth element (starting from 0) of an array.

    For example, to import each element of the {"A": "simple", "B": {"C": "nesting"}, "D": ["array", 2, {"E": "complicated"}]} object, the field names in the foreign table must be defined as a, b, b_c, d, d#0, d#1, d#2 and d#2_e. The sequence in which the fields are defined does not affect the import result.

  • type_name

    Specifies the data type of the column.

  • constraint_name

    Specifies the name of a constraint for the foreign table.

  • { NULL | NOT NULL }

    Specifies whether the column allows NULL.

    When you create a table, whether the data in HDFS is NULL or NOT NULL cannot be guaranteed. The consistency of data is guaranteed by users. Users must decide whether the column is NULL or NOT NULL. (The optimizer optimizes the NULL/NOT NULL and generates a better plan.)

  • SERVER server_name

    Specifies the server name of the foreign table. Users can customize its name.

    Value range: a string indicating an existing server. It must comply with the naming convention.

  • OPTIONS ( { option_name ' value ' } [, ...] )
    Specifies the following parameters for a foreign table:
    • header

      Specifies whether a data file contains a table header. header is available only for CSV files.

      If header is on, the first row of the data file will be identified as the header and ignored during export. If header is off, the first row will be identified as a data row.

      Value range: true, on, false, and off. The default value is false or off.

    • quote

      Specifies the quotation mark for the CSV format. The default value is a double quotation mark (").

      The quote value cannot be the same as the delimiter or null value.

      The quote value must be a single-byte character.

      Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.

    • escape

      Specifies an escape character for a CSV file. The value must be a single-byte character.

      The default value is a double quotation mark ("). If the value is the same as the quote value, it will be replaced with \0.

    • location

      Specifies the file path on OBS. This is an OBS foreign table parameter. The data sources of multiple buckets are separated by vertical bars (|), for example, LOCATION 'obs://bucket1/folder/ | obs://bucket2/'. The database scans all objects in the specified folders.

      When accessing a DLI multi-version table, you do not need to specify the location parameter.

    • format: format of the data source file in the foreign table.
      • HDFS read-only foreign tables support ORC, TEXT, JSON, CSV, and Parquet file formats, while the write-only foreign tables support only the ORC file format.
      • OBS read-only foreign tables support ORC, TEXT, JSON, CSV, CarbonData, and Parquet file formats, while the write-only foreign tables support only the ORC file format.

      Only JSON objects (embraced in {}) can be imported. JSON arrays (embraced in []) cannot be imported. However, arrays inside a JSON object can be imported.

    • foldername: The directory of the data source file in the foreign table, that is, the corresponding file directory in HDFS or on OBS. This parameter is mandatory for the write-only foreign table and optional for the read-only foreign table.

      When accessing a DLI multi-version table, you do not need to specify the foldername parameter.

    • encoding: encoding of data source files in foreign tables. The default value is utf8. This parameter is optional.
    • totalrows: (Optional) estimated number of rows in a table. This parameter is used only for OBS foreign tables. Because OBS may store many files, it is slow to analyze data. This parameter allows you to set an estimated value so that the optimizer can estimate the table size according to the value. Generally, query efficiency is high when the estimated value is close to the actual value.
    • filenames: data source files specified in the foreign table. Multiple files are separated by commas (,).
      • You are advised to use the foldername parameter to specify the location of the data source. For a read-only foreign table, either filenames or foldername must be specified. For a write-only foreign table, only foldername can be specified.
      • If foldername is an absolute directory, it should be enclosed by slashes (/). Multiple paths are separated by commas (,).
      • When you query a partitioned table, data is pruned based on partition information, and data files that meet the requirement are queried. Pruning involves scanning HDFS directory contents many times. Therefore, do not use columns with low repetition as partition column.
      • An OBS read-only foreign table is not supported.
    • delimiter

      Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab.

      • A delimiter cannot be \r or \n.
      • A delimiter cannot be the same as the null parameter.
      • A separator cannot contain(\), (.), digits, or letters.
      • The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
      • You are advised to use a multi-character, such as the combination of the dollar sign ($), caret (^), ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.
      • delimiter is available only for TEXT and CSV source data files.

      Valid value:

      The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.

    • eol

      Specifies the newline character style of the imported data file.

      Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.

      • The eol parameter applies only to TEXT files.
      • The value of the eol parameter cannot be the same as that of DELIMITER or NULL.
      • The value of the eol parameter cannot contain digits, letters, or periods (.).
    • null
      Specifies the string that represents a null value.
      • The null value cannot be \r or \n. The maximum length is 100 characters.
      • The null parameter cannot be the same as the delimiter.
      • null is available only for TEXT and CSV source data files.

      Valid value:

      The default value is \N for the TEXT format.

    • noescaping

      Specifies in TEXT format, whether to escape the backslash (\) and its following characters.

      noescaping is available only for TEXT source data files.

      Value range: true, on, false, and off. The default value is false or off.

    • fill_missing_fields

      Specifies whether to generate an error message when the last column in a row in the source file is lost during data loading.

      Value range: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on and the last column of a data row in a data source file is lost, the column is replaced with NULL and no error message will be generated.
      • If this parameter is set to false or off and the last column is missing, the following error information will be displayed:
        missing data for column "tt"
      • Because SELECT COUNT(*) does not parse columns in TEXT format, it does not report missing columns.
      • fill_missing_fields is available only for TEXT and CSV source data files.
    • ignore_extra_data

      Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is available only during data importing.

      Value range: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on and the number of data source files exceeds the number of foreign table columns, excessive columns will be ignored.
      • If this parameter is set to false or off and the number of data source files exceeds the number of foreign table columns, the following error information will be displayed:
        extra data after last expected column
      • If the newline character at the end of the row is lost, setting the parameter to true will ignore data in the next row.
      • Because SELECT COUNT(*) does not parse columns in TEXT format, it does not report missing columns.
      • ignore_extra_data is available only for TEXT and CSV source data files.
    • date_format

      Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: any valid DATE value. For details, see Date and Time Processing Functions and Operators.

      • If ORACLE is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.
      • date_format is available only for TEXT and CSV source data files.
    • time_format

      Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: a valid TIME value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.

      time_format is available only for TEXT and CSV source data files.

    • timestamp_format

      Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: any valid TIMESTAMP value. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.

      timestamp_format is available only for TEXT and CSV source data files.

    • smalldatetime_format

      Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: a valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.

      smalldatetime_format is available only for TEXT and CSV source data files.

    • dataencoding

      This parameter specifies the data code of the data table to be exported when the database code is different from the data code of the data table. For example, the database code is Latin-1, but the data in the exported data table is in UTF-8 format. This parameter is optional. If this parameter is not specified, the database encoding format is used by default. This syntax is valid only for the write-only HDFS foreign table.

      Value range: data code types supported by the database encoding

      The dataencoding parameter is valid only for the ORC-formatted write-only HDFS foreign table.

    • filesize

      Specifies the file size of a write-only foreign table. This parameter is optional. If this parameter is not specified, the file size in the distributed file system configuration is used by default. This syntax is available only for the write-only foreign table.

      Value range: an integer ranging from 1 to 1024

      The filesize parameter is valid only for the ORC-formatted write-only HDFS foreign table.

    • compression

      Specifies the compression mode of ORC files. This parameter is optional. This syntax is available only for the write-only foreign table.

      Value range: zlib, snappy, and lz4 The default value is snappy.

    • version

      Specifies the ORC version number. This parameter is optional. This syntax is available only for the write-only foreign table.

      Value range: Only 0.12 is supported. The default value is 0.12.

    • dli_project_id

      Specifies the project ID corresponding to DLI. You can obtain the project ID from the management console. This parameter is available only when the server type is DLI. This feature is supported only in 8.1.1 or later.

    • dli_database_name

      Specifies the name of the database where the DLI multi-version table to be accessed is located. This parameter is available only when the server type is DLI. This feature is supported only in 8.1.1 or later.

    • dli_table_name

      Specifies the name of the DLI multi-version table to be accessed. This parameter is available only when the server type is DLI. This feature is supported only in 8.1.1 or later.

    • checkencoding

      Specifies whether to check the character encoding.

      Value range: low, high The default value is low.

      In TEXT format, the rule of error tolerance for invalid characters imported is as follows:

      • \0 is converted to a space.
      • Other invalid characters are converted to question marks.
      • Setting checkencoding to low enables invalid characters toleration. If NULL and DELIMITER are set to spaces or question marks (?), errors like "illegal chars conversion may confuse null 0x20" will be displayed, prompting you to modify parameters that may cause confusion and preventing importing errors.

      In ORC format, the rule of error tolerance for invalid characters imported is as follows:

      • If checkencoding is low, an imported field containing invalid characters will be replaced with a quotation mark string of the same length.
      • If checkencoding is high, data import stops when an invalid character is detected.
    • force_mapping

      Indicates the handling method used when no correct name-value pairs are matched for the foreign table columns in JSON format.

      The value can be true or false. Default value: true

      • If force_mapping is true, null is entered in the corresponding column. The meaning of null is the same as that defined in JSON.
      • If force_mapping is false, an error is reported, indicating that the column does not exist.

      There are no restrictions on JSON objects. While the definition of foreign table fields must comply with GaussDB(DWS) identifier specifications (such as length and character restrictions). Therefore, this import method may cause exceptions. For example, a JSON name cannot be correctly identified or a field is repeatedly defined. You are advised to use the fault tolerance option force_mapping or JSON operators (for details, see JSON/JSONB Functions and Operators).

      For JSON format, SELECT COUNT(*) does not parse specific fields. Therefore, no error is reported when a field is missing or the format is incorrect.

    Table 1 OBS foreign table options supported by Text, CSV, JSON, ORC, CarbonData, and Parquet formats

    Parameter

    OBS

    -

    TEXT

    CSV

    JSON

    ORC

    CARBONDATA

    PARQUET

    READ ONLY

    READ ONLY

    READ ONLY

    READ ONLY

    WRITE ONLY

    READ ONLY

    READ ONLY

    location

    ×

    format

    header

    ×

    ×

    ×

    ×

    ×

    ×

    delimiter

    ×

    ×

    ×

    ×

    ×

    quote

    ×

    ×

    ×

    ×

    ×

    ×

    escape

    ×

    ×

    ×

    ×

    ×

    ×

    null

    ×

    ×

    ×

    ×

    ×

    noescaping

    ×

    ×

    ×

    ×

    ×

    ×

    encoding

    fill_missing_fields

    ×

    ×

    ×

    ×

    ×

    ignore_extra_data

    ×

    ×

    ×

    ×

    ×

    date_format

    ×

    ×

    ×

    ×

    time_format

    ×

    ×

    ×

    ×

    timestamp_format

    ×

    ×

    ×

    ×

    smalldatetime_format

    ×

    ×

    ×

    ×

    chunksize

    ×

    ×

    ×

    ×

    filenames

    ×

    ×

    ×

    ×

    ×

    ×

    ×

    foldername

    dataencoding

    ×

    ×

    ×

    ×

    ×

    ×

    ×

    filesize

    ×

    ×

    ×

    ×

    ×

    ×

    ×

    compression

    ×

    ×

    ×

    ×

    ×

    ×

    version

    ×

    ×

    ×

    ×

    ×

    ×

    checkencoding

    ×

    totalrows

    ×

    ×

    ×

    force_mapping

    ×

    ×

    ×

    ×

    ×

    ×

    Table 2 HDFS foreign table options supported by Text, CSV, JSON, ORC, and Parquet formats

    Parameter

    HDFS

    -

    TEXT

    CSV

    JSON

    ORC

    PARQUET

    READ ONLY

    READ ONLY

    READ ONLY

    READ ONLY

    WRITE ONLY

    READ ONLY

    location

    ×

    ×

    ×

    ×

    ×

    ×

    format

    header

    ×

    ×

    ×

    ×

    ×

    delimiter

    ×

    ×

    ×

    ×

    quote

    ×

    ×

    ×

    ×

    ×

    escape

    ×

    ×

    ×

    ×

    ×

    null

    ×

    ×

    ×

    ×

    noescaping

    ×

    ×

    ×

    ×

    ×

    encoding

    fill_missing_fields

    ×

    ×

    ×

    ×

    ignore_extra_data

    ×

    ×

    ×

    ×

    date_format

    ×

    ×

    ×

    time_format

    ×

    ×

    ×

    timestamp_format

    ×

    ×

    ×

    smalldatetime_format

    ×

    ×

    ×

    chunksize

    ×

    ×

    ×

    filenames

    ×

    foldername

    dataencoding

    ×

    ×

    ×

    ×

    ×

    filesize

    ×

    ×

    ×

    ×

    ×

    compression

    ×

    ×

    ×

    ×

    ×

    version

    ×

    ×

    ×

    ×

    ×

    checkencoding

    totalrows

    ×

    ×

    ×

    ×

    ×

    ×

    force_mapping

    ×

    ×

    ×

    ×

    ×

  • WRITE ONLY | READ ONLY

    WRITE ONLY creates a write-only HDFS/OBS foreign table.

    READ ONLY creates a read-only HDFS/OBS foreign table.

    If the foreign table type is not specified, a read-only foreign table is created by default.

  • DISTRIBUTE BY ROUNDROBIN

    Specifies ROUNDROBIN as the distribution mode for the HDFS/OBS foreign table.

  • DISTRIBUTE BY REPLICATION

    Specifies REPLICATION as the distribution mode for the HDFS foreign table.

  • PARTITION BY ( column_name ) AUTOMAPPED

    column_name specifies the partition column. AUTOMAPPED means the partition column specified by the HDFS partitioned foreign table is automatically mapped with the partition directory information in HDFS. The prerequisite is that the sequences of partition columns specified in the HDFS foreign table and in the directory are the same. This function is applicable only to read-only foreign tables.

    • HDFS read-only and write-only foreign tables support partitioned tables. However, write-only foreign tables support only primary partitions and do not support multi-level partitions.
    • Partitioned tables can be used as read-only foreign tables for OBS.
    • Columns of the floating point or Boolean type cannot be used as partition columns.
    • The maximum length of a partition field can be specified by the GUC parameter dfs_partition_directory_length.
    • A partition directory name is in the format Partition column name=Partition column value. Special characters in the name will be escaped. To ensure that the total length of the name after escaping does not exceed dfs_partition_directory_length, it is advisable to keep the name length before escaping less than or equal to (dfs_partition_directory_length + 1)/3.
    • Do not use a column containing too many Chinese characters as a partition column. You may encounter errors when calculating the final partition directory name's length due to the different space requirements of Chinese and English characters. This is particularly true when the partition directory name exceeds the dfs_partition_directory_length length limit.
  • CONSTRAINT constraint_name

    Specifies the name of informational constraint of the foreign table.

    Value range: a string. It must comply with the naming convention.

  • PRIMARY KEY

    The primary key constraint specifies that one or more columns of a table must contain unique (non-duplicate) and non-null values. Only one primary key can be specified for a table.

  • UNIQUE

    Specifies that a group of one or more columns of a table must contain unique values. For the purpose of a unique constraint, NULL is not considered equal.

  • NOT ENFORCED

    Specifies the constraint to be an informational constraint. This constraint is guaranteed by the user instead of the database.

  • ENFORCED

    The default value is ENFORCED. ENFORCED is a reserved parameter and is currently not supported.

  • PRIMARY KEY (column_name)

    Specifies the informational constraint on column_name.

    Value range: a string. It must comply with the naming convention, and the value of column_name must exist.

  • ENABLE QUERY OPTIMIZATION

    Optimizes an execution plan using an informational constraint.

  • DISABLE QUERY OPTIMIZATION

    Disables the optimization of an execution plan using an informational constraint.

Informational Constraint

In GaussDB(DWS), the use of data constraints depend on users. If users can make data sources strictly comply with certain constraints, the query on data with such constraints can be accelerated. Foreign tables do not support Index. Informational constraint is used for optimizing query plans.

The constraints of creating informational constraints for a foreign table are as follows:

  • You can create an informational constraint only if the values in a NOT NULL column in your table are unique. Otherwise, the query result will be different from expected.
  • Currently, the informational constraint of GaussDB(DWS) supports only PRIMARY KEY and UNIQUE constraints.
  • The informational constraints of GaussDB(DWS) support the NOT ENFORCED attribute.
  • UNIQUE informational constraints can be created for multiple columns in a table, but only one PRIMARY KEY constraint can be created in a table.
  • Multiple informational constraints can be established in a column of a table (because the function that establishing a column or multiple constraints in a column is the same.) Therefore, you are not advised to set up multiple informational constraints in a column, and only one Primary Key type can be set up.
  • Multi-column combination constraints are not supported.
  • Different CNs in the same cluster cannot concurrently export data to the same write-only ORC foreign table.
  • The catalog of a write-only foreign table in ORC format can only be used as the export catalog of a single foreign table of GaussDB(DWS). It cannot be used for multiple foreign tables, and other components cannot write other files to this catalog.

Example 1

Example 1: In HDFS, import the TPC-H benchmark test tables part and region using Hive. The path of the part table is /user/hive/warehouse/partition.db/part_4, and that of the region table is /user/hive/warehouse/gauss.db/region_orc11_64stripe/.

  1. Establish HDFS_Server, with HDFS_FDW or DFS_FDW as the foreign data wrapper.
    1
    CREATE SERVER hdfs_server FOREIGN DATA WRAPPER HDFS_FDW OPTIONS (address '10.10.0.100:25000,10.10.0.101:25000',hdfscfgpath '/opt/hadoop_client/HDFS/hadoop/etc/hadoop',type'HDFS');
    
    • The IP addresses and port numbers of HDFS NameNodes are specified in OPTIONS. For details about the port number, search for dfs.namenode.rpc.port in the MRS-HDFS service configuration. In this example the port number is 25000.
    • 10.10.0.100:25000,10.10.0.101:25000 indicates the IP addresses and port numbers of the primary and standby HDFS NameNodes. It is the recommended format. Two groups of parameters are separated by commas (,).
  2. Create an HDFS foreign table. The HDFS server associated with the table is hdfs_server, the corresponding file format of the ft_region table on the HDFS server is 'orc', and the file directory in the HDFS file system is '/user/hive/warehouse/gauss.db/region_orc11_64stripe/'.
    • Create an HDFS foreign table without partition keys.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      DROP FOREIGN TABLE IF EXISTS ft_region;
      CREATE FOREIGN TABLE ft_region
      (
          R_REGIONKEY INT4,
          R_NAME TEXT,
          R_COMMENT TEXT
      )
      SERVER
          hdfs_server
      OPTIONS
      (
          FORMAT 'orc',
          encoding 'utf8',
          FOLDERNAME '/user/hive/warehouse/gauss.db/region_orc11_64stripe/'
      )
      DISTRIBUTE BY 
           roundrobin;
      
    • Create an HDFS foreign table with partition keys.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      CREATE FOREIGN TABLE ft_part 
      (
           p_partkey int, 
           p_name text, 
           p_mfgr text, 
           p_brand text, 
           p_type text, 
           p_size int, 
           p_container text, 
           p_retailprice float8, 
           p_comment text
      )
      SERVER
           hdfs_server
      OPTIONS
      (
           FORMAT 'orc',
           encoding 'utf8',
           FOLDERNAME '/user/hive/warehouse/partition.db/part_4'
      )
      DISTRIBUTE BY 
           roundrobin
      PARTITION BY 
           (p_mfgr) AUTOMAPPED;
      

      GaussDB(DWS) allows you to specify files using the keyword filenames or foldername. The latter is recommended. The key word distribute specifies the storage distribution mode of the ft_region table.

  3. View the created server and foreign table.
    1
    2
    SELECT * FROM pg_foreign_table WHERE ftrelid='ft_region'::regclass;
    SELECT * FROM pg_foreign_table WHERE ftrelid='ft_part'::regclass;
    

Example 2

Export data from the TPC-H benchmark test table region to the /user/hive/warehouse/gauss.db/regin_orc/ directory of the HDFS file system through the HDFS write-only foreign table.

  1. Create an HDFS foreign table. The corresponding foreign data wrapper is HDFS_FDW or DFS_FDW, which is the same as that in Example 1.
  2. Create a write-only HDFS foreign table.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    CREATE FOREIGN TABLE ft_wo_region
    (
        R_REGIONKEY INT4,
        R_NAME TEXT,
        R_COMMENT TEXT
    )
    SERVER
        hdfs_server
    OPTIONS
    (
        FORMAT 'orc',
        encoding 'utf8',
        FOLDERNAME '/user/hive/warehouse/gauss.db/regin_orc/'
    )
    WRITE ONLY;
    
  3. Writes data to the HDFS file system through a write-only foreign table.
    1
    INSERT INTO ft_wo_region SELECT * FROM region;
    

Example 3

Perform operations on an HDFS foreign table that includes informational constraints.

  • Create an HDFS foreign table with informational constraints.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    CREATE FOREIGN TABLE ft_region  (
     R_REGIONKEY  int,
     R_NAME TEXT,
     R_COMMENT TEXT
      , primary key (R_REGIONKEY) not enforced)
    SERVER hdfs_server
    OPTIONS(format 'orc',
        encoding 'utf8',
     foldername '/user/hive/warehouse/gauss.db/region_orc11_64stripe')
    DISTRIBUTE BY roundrobin;
    
  • Check whether the region table has an informational constraint index:
    1
    SELECT relname,relhasindex FROM pg_class WHERE oid='ft_region'::regclass;
    
    Figure 1 Viewing relname
    1
    SELECT conname, contype, consoft, conopt, conindid, conkey FROM pg_constraint WHERE conname ='ft_region_pkey';
    
    Figure 2 Viewing informational constraint indexes
  • Delete the informational constraint:
    1
    2
    ALTER FOREIGN TABLE ft_region DROP CONSTRAINT ft_region_pkey RESTRICT;
    SELECT conname, contype, consoft, conindid, conkey FROM pg_constraint WHERE conname ='ft_region_pkey';
    
    Figure 3 Delete informational constraints
  • Add a unique informational constraint for the foreign table:
    1
    ALTER FOREIGN TABLE ft_region ADD CONSTRAINT constr_unique UNIQUE(R_REGIONKEY) NOT ENFORCED;
    
    Delete the unique informational constraint:
    1
    2
    ALTER FOREIGN TABLE ft_region DROP CONSTRAINT constr_unique RESTRICT;
    SELECT conname, contype, consoft, conindid, conkey FROM pg_constraint WHERE conname ='constr_unique';
    
  • Add a unique informational constraint for the foreign table:
    1
    2
    ALTER FOREIGN TABLE ft_region ADD CONSTRAINT constr_unique UNIQUE(R_REGIONKEY) NOT ENFORCED disable query optimization;
    SELECT relname,relhasindex FROM pg_class WHERE oid='ft_region'::regclass;
    
    Delete the unique informational constraint:
    1
    ALTER FOREIGN TABLE ft_region DROP CONSTRAINT constr_unique CASCADE;
    

Example 4

Read json data stored in OBS using a foreign table.

  1. The following JSON files are on OBS. The JSON objects contain nesting and arrays. Some objects have lost columns, and some object names are duplicate.
    {"A" : "simple1", "B" : {"C" : "nesting1"}, "D" : ["array", 2, {"E" : "complicated"}]}
    {"A" : "simple2", "D" : ["array", 2, {"E" : "complicated"}]}
    {"A" : "simple3", "B" : {"C" : "nesting3"}, "D" : ["array", 2, {"E" : "complicated3"}]}
    {"B" : {"C" : "nesting4"},"A" : "simple4",  "D" : ["array", 2, {"E" : "complicated4"}]}
    {"A" : "simple5", "B" : {"C" : "nesting5"}, "D" : ["array", 2, {"E" : "complicated5"}]}
  2. Create obs_server, with DFS_FDW as the foreign data wrapper.
    1
    2
    3
    4
    5
    6
    CREATE SERVER obs_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( 
      ADDRESS 'obs.example.com', 
      ACCESS_KEY 'xxxxxxxxx', 
      SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', 
      TYPE 'OBS'
    );
    
    • ADDRESS is the endpoint of OBS. Replace it with the actual endpoint. You can find the domain name by searching for the value of regionCode in the region_map file.
    • ACCESS_KEY and SECRET_ACCESS_KEY are the access keys used for the cloud account system. Replace the values as needed.
    • Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
    • TYPE indicates the server type. Retain the value OBS.
  3. Create the OBS foreign table json_f and define the column names. For example, d#2_e indicates that the column is object e nested in the 2nd element of array d. The OBS server associated with the table is obs_server. foldername indicates the data source directory of the foreign table, that is, the OBS directory.

    // Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    CREATE FOREIGN TABLE json_f (
      a VARCHAR(10),
      b_c TEXT,
      d#1 INTEGER,
      d#2_e VARCHAR(30)
    )SERVER obs_server OPTIONS (
        foldername '/xxx/xxx/',
        format 'json',
        encoding 'utf8',
        force_mapping 'true'
    )distribute by roundrobin;
    
  4. Query the foreign table json_f. The fault tolerance parameter force_mapping is enabled by default. If a column is missing in a JSON object, NULL is filled in. If a JSON object name is duplicate, the last name prevails.
    1
    SELECT * FROM json_f;
    
    Figure 4 View the result of foreign table json_f

Example 5

Read a DLI multi-version foreign table using a foreign table. Only DLI 8.1.1 and later support the multi-version foreign table example.

  1. Create dli_server, with DFS_FDW as the foreign data wrapper.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    CREATE SERVER dli_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( 
      ADDRESS 'obs.example.com', 
      ACCESS_KEY 'xxxxxxxxx', 
      SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', 
      TYPE 'DLI',
      DLI_ADDRESS 'dli.example.com',
      DLI_ACCESS_KEY 'xxxxxxxxx',
      DLI_SECRET_ACCESS_KEY 'yyyyyyyyyyyyy'
    );
    
    • ADDRESS is the endpoint of OBS. DLI_ADDRESS is the endpoint of DLI. Replace it with the actual endpoint.
    • ACCESS_KEY and SECRET_ACCESS_KEY are the access keys used by the cloud account system to access OBS. Use the actual value.
    • DLI_ACCESS_KEY and DLI_SECRET_ACCESS_KEY are access keys for the cloud account system to access DLI. Use the actual value.
    • Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
    • TYPE indicates the server type. Retain the value DLI.
  2. Create the OBS foreign table customer_address for accessing DLI. The table does not contain partition columns, and the DLI server associated with the table is dli_server. Where, the project_id is xxxxxxxxxxxxxxx, the database_name on DLI is database123, and the table_name of the table to be accessed is table456. Replace them based on the actual requirements.

    // Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    CREATE FOREIGN TABLE customer_address
    (
        ca_address_sk             integer               not null,
        ca_address_id             char(16)              not null,
        ca_street_number          char(10)                      ,   
        ca_street_name            varchar(60)                   ,   
        ca_street_type            char(15)                      ,   
        ca_suite_number           char(10)                      ,   
        ca_city                   varchar(60)                   ,   
        ca_county                 varchar(30)                   ,   
        ca_state                  char(2)                       ,   
        ca_zip                    char(10)                      ,   
        ca_country                varchar(20)                   ,   
        ca_gmt_offset             decimal(36,33)                  ,   
        ca_location_type          char(20)    
    ) 
    SERVER dli_server OPTIONS (
        FORMAT 'ORC',
        ENCODING 'utf8',
        DLI_PROJECT_ID 'xxxxxxxxxxxxxxx',
        DLI_DATABASE_NAME 'database123',
        DLI_TABLE_NAME 'table456'
    )
    DISTRIBUTE BY roundrobin;
    
  3. Query data in a DLI multi-version table using a foreign table.
    1
    SELECT COUNT(*) FROM customer_address;
    
    Figure 5 Query results