Help Center/ GaussDB/ Developer Guide(Distributed_2.x)/ SQL Reference/ SQL Syntax/ CREATE FOREIGN TABLE (for Import and Export)
Updated on 2024-10-14 GMT+08:00

CREATE FOREIGN TABLE (for Import and Export)

CREATE FOREIGN TABLE creates a GDS foreign table.

Function

CREATE FOREIGN TABLE creates a GDS foreign table in the current database for concurrent data import and export. The GDS foreign table can be read-only or write-only, used for concurrent data import and export, respectively. The OBS foreign table is read-only by default.

Precautions

  • The foreign table is owned by the user who runs the statement.
  • The distribution mode of a GDS foreign table does not need to be explicitly specified. The default mode is ROUNDROBIN.
  • All constraints (including column and row constraints) are invalid to the GDS foreign table.
  • When multi-layer quotation marks are used for sensitive columns (such as password and secret_access_key) in OPTIONS, the semantics is different from that in the scenario where quotation marks are not used. Therefore, sensitive columns are not identified for anonymization.

Syntax

1
2
3
4
5
6
7
8
9
CREATE FOREIGN TABLE [ IF NOT EXISTS  ] table_name 
    ( [ { column_name type_name POSITION ( offset, length ) | LIKE source_table } [, ...] ] ) 
    SERVER gsmpp_server 
    OPTIONS (  { option_name ' value '  }  [, ...] ) 
    [ { WRITE ONLY  |  READ ONLY  }] 
    [ WITH error_table_name | LOG INTO error_table_name] 
    [REMOTE LOG 'name'] 
    [PER NODE REJECT LIMIT 'value']
    [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ];

Parameter Overview

CREATE FOREIGN TABLE provides multiple parameters, which are classified as follows:

Parameter Description

  • IF NOT EXISTS

    Sends a notice, but does not throw an error, if a table with the same name exists.

  • table_name

    Specifies the name of a foreign table.

    Value range: a string. It must comply with the naming convention.

  • column_name

    Specifies the name of a column in the foreign table.

    Value range: a string. It must comply with the naming convention.

  • type_name

    Specifies the data type of the column.

  • POSITION(offset,length)

    Defines the place of each column in the data file in fixed length mode.

    offset is the start of the column in the source file, and length is the length of the column.

    Value range: offset must be greater than 0, and its unit is byte.

    The length of each record must be less than or equal to 1 GB. By default, columns not in the file are replaced with null.

  • SERVER gsmpp_server

    Specifies the server name of the foreign table. For the GDS foreign table, its server is created by the initial database, which is gsmpp_server.

  • OPTIONS ( { option_name ' value ' } [, ...] )

    Specifies all types of parameters of foreign table data.

    • location

      Specifies the data source location of the foreign table, which can be expressed through URLs or in local files. Separate URLs and local files with vertical bars (|).

      • For a read-only foreign table imported by GDS from a remote server in parallel, its URL must end with its corresponding schema or file name. (Read-only is the default file attribute.)

        For example: gsfs://192.168.0.90:5000/*, file:///data/data.txt, or gsfs:// 192.168.0.90:5000/* | gsfs:// 192.168.0.91:5000/*.

      • For a writable foreign table exported by GDS to a remote server in parallel, its URL does not need to contain a file name. If the data source is local, for example, file:///data/, only one data source in the foreign table can be specified, and the hosting directory must be created in advance on each node. If the data source location is a remote URL, for example, gsfs:// 192.168.0.90:5000/, multiple data sources can be specified. If the number of exported data file locations is less than or equal to the number of DNs, when you use the foreign table for export, data is evenly distributed to each data source location. If the number of exported data file locations is greater than the number of DNs, when you export data, the data is evenly distributed to data source locations corresponding to the DNs. Blank data files are created on the excess data source locations.
      • For a read-only foreign table imported by GDS from a remote server in parallel, the number of URLs must be less than the number of DNs, and URLs in the same location cannot be used.
      • If the URL begins with gsfss://, data is imported and exported in encryption mode, and DOP cannot exceed 10.
    • format

      Specifies the format of the data source file in a foreign table.

      Value range: CSV, TEXT, and FIXED. The default value is TEXT.

      • In CSV files, escape sequences are processed as common strings. Therefore, newline characters are processed as data.
      • In TEXT files, escape sequences are processed as they are. Therefore, newline characters are not processed as data.
      • In FIXED files, the column length of each record is the same and spaces are used for padding.
      • An escape sequence is a string starting with a backslash (\), including \b (backspace), \f (form feed), \n (new line), \r (carriage return), \t (horizontal tab), \v (vertical tab), \digit (octal number), and \xdigit (hexadecimal number). In TEXT files, strings are processed as they are. In files of other formats, strings are processed as data.
      • FIXED is defined as follows (POSITION must be specified for each column when FIXED is used.):
        1. The column length of each record is the same.
        2. Spaces are used for column padding. Columns of the numeric type are left-aligned and columns of the string type are right-aligned.
        3. No delimiters are used between columns.
    • header

      Specifies whether a file contains a header with the names of each column in the file. header is available only for CSV and FIXED files.

      When data is imported, if header is on, the first row of the data file will be identified as the header and ignored. If header is off, the first row will be identified as a data row.

      When data is exported, if header is on, fileheader must be specified. fileheader is used to specify the export header file format. If header is off, an exported file does not contain a header.

      Value range: true/on and false/off The default value is false/off.

    • fileheader

      Specifies a file that defines the content in the header for exported data. The file contains one row of data description of each column.

      For example, to add a header in a file containing product information, define the file as follows:

      The information of products.\n

      • This parameter is available only when header is on or true. The file must be prepared in advance.
      • In Remote mode, the definition file must be put to the working directory of the GDS (the -d directory specified when the GDS is started).
      • In Local mode, the definition file must be put to the same path of each node, and fileheader specifies the absolute path.
      • The file can contain only one row of header information, and end with a newline character. Excess rows will be discarded. (Header information cannot contain newline characters).
      • The length of the file including the newline character cannot exceed 1 MB.
    • out_filename_prefix

      Specifies the name prefix of the data file exported using GDS from a write-only foreign table.

      • The file name prefix must be valid and compliant with the restrictions of the file system in the physical environment where the GDS is deployed. Otherwise, the file will fail to be created.
        • The prefix of the specified export file name does not contain invalid characters, including but not limited to: '/', '?', '*', ':', '|', '\\', '<', '>', '@', '#', '$', '&', '(', ')', '+', '-'. The allowed characters are [a-z]*[A-Z]*[0-9]* and '_'.
        • The file name prefix cannot contain feature columns reserved for the Windows and Linux OS, including but not limited to:

          "con", "aux", "nul", "prn", "com0", "com1", "com2", "com3", "com4", "com5", "com6", "com7", "com8", "com9", "lpt0", "lpt1", "lpt2", "lpt3", "lpt4", "lpt5", "lpt6", "lpt7", "lpt8", and "lpt9".

        • The total length of the absolute path consisting of the exported file prefix, the path specified by gds –d, and .dat.x should be as required by the file system where GDS is deployed.
        • It is required that the prefix can be correctly parsed and identified by the receiver (including but not limited to the original database where it was exported) of the data file. Identify and modify the option that causes the file name resolution problem (if any).
      • To concurrently perform export jobs, do not use the same file name prefix for them. Otherwise, the exported files may overwrite each other or be lost in the OS or file system.
    • delimiter

      Specifies the column delimiter of data. Use the default delimiter if it is not set. The default delimiter in the TEXT format is a tab and that in the CSV format is a comma (,). No delimiter is used in the FIXED format.

      • The value of delimiter cannot be \r or \n.
      • A delimiter cannot be the same as the null value. The delimiter for the CSV format cannot be same as the quote value.
      • The delimiter of TEXT data cannot contain any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789.
      • The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
      • You are advised to use a multi-character, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.

      Value range:

      The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.

    • quote

      Specifies which characters in a CSV source data file will be quoted. The default value is single quotation marks ('').

      • The value of quote cannot be the same as that of the delimiter or null parameter.
      • The value of quote must be a single-byte character.
      • Invisible characters are recommended, such as 0x07, 0x08, and 0x1b.
    • escape

      Specifies which characters in a CSV source data file are escape characters. Escape characters can only be single-byte characters.

      The default value is single quotation marks (''). If the value is the same as that of quote, it will be replaced by \0.

    • null
      Specifies the string that represents a null value.
      • A null value cannot be \r or \n. The maximum length is 100 characters.
      • A null value cannot be the same as the delimiter or quote value.

      Value range:

      • The default value is \n for the TEXT format.
      • The default value for the CSV format is an empty string without quotation marks.
    • noescaping

      Specifies, in the TEXT format, whether to escape the backslash (\) and its following characters.

      noescaping is available only for TEXT source data files.

      Value range: true/on and false/off The default value is false/off.

    • encoding

      Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database.

      Before you import foreign tables, it is recommended that you set client_encoding to the file encoding format, or a format matching the character set of the file. Otherwise, parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. In a foreign table used for export, you are also advised to set this parameter because the export result using the default character set may not be what you expected.

      If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.

      Currently, GDS foreign tables cannot parse data files using multiple encoding formats during data import and cannot write such data files during data export.

    • fill_missing_fields

      Specifies how to handle the problem that the last column of a row in a source data file is lost during data import.

      Value range: true/on and false/off The default value is false/off.

      • true/on: If the last column in a row of a data source file is missing during data loading, the column is set to null and no import error message is reported.
      • false/off: If the last column in a row of a data source file is missing during data import, the following error message is reported:
        missing data for column "tt"
    • ignore_extra_data

      Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is used only during data import.

      Value range: true/on and false/off The default value is false/off.

      • true/on: If the number of columns in a data source file is greater than that defined by the foreign table, the extra columns at the end of a row are ignored.
      • false/off: If the number of columns in a data source file is greater than that defined by the foreign table, the following error message is reported:
        extra data after last expected column

      If a newline character at the end of a row is missing and the row and another row are integrated into one, data in another row is ignored after the parameter is set to true.

    • reject_limit

      Specifies the maximum number of data format errors allowed during a data import task. If the number of errors does not reach the maximum number, the data import task can still be executed.

      You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'.

      Data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data-format error occurs, the whole data import process is stopped.

      Value range: a positive integer or unlimited

      The default value is 0, indicating that error information is returned immediately.

      Enclose positive integer values with single quotation marks ('').

    • mode

      Specifies the data import policy during a specific data import process.

      Value range:

      • Normal (default): supports all file types (CSV, TEXT, FIXED). Enable GDS to help data import.
      • Shared: supports the TEXT format. It does not need assistance from GDS, but requires that all the user data be mounted to the same path of the same nodes through NFS.
      • Private: User data has been stored in the same local directories of DNs.
    • eol

      Specifies the newline character style of the imported or exported data file.

      Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.

      • The eol parameter supports only the TEXT format for data import and export and does not support the CSV or FIXED format for data import. For forward compatibility, the EOL parameter can be set to 0x0D or 0x0D0A for data export in the CSV or FIXED format.
      • The value of EOL cannot be the same as that of the delimiter or null parameter.
      • The EOL parameter value cannot contain the following characters: .abcdefghijklmnopqrstuvwxyz0123456789.
    • fix

      Specifies the length of fixed-length data. The unit is byte. This syntax is available only for READ ONLY foreign tables.

      Value range: less than 1 GB, and greater than or equal to the total length specified by POSITION (The total length is the sum of offset and length in the last column of the table definition.)

    • out_fix_alignment

      Specifies how the columns of the types BYTEAOID, CHAROID, NAMEOID, TEXTOID, BPCHAROID, VARCHAROID, NVARCHAR2OID, and CSTRINGOID are aligned during fixed-length export.

      Value range: align_left and align_right

      Default value: align_right

      The bytea data type must be in hexadecimal format (for example, \XXXX) or octal format (for example, \XXX\XXX\XXX). The data to be imported must be left-aligned (that is, the column data starts with either of the two formats instead of spaces). Therefore, if the exported file needs to be imported using a GDS foreign table and the file data length is less than that specified by the foreign table formatter, the exported file must be left aligned. Otherwise, an error is reported during the import.

    • date_format

      Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: a valid DATE value For details, see Date and Time Processing Functions and Operators.

      If Oracle is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.

    • time_format

      Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: a valid TIME value. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.

    • timestamp_format

      Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: a valid TIMESTAMP value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.

    • smalldatetime_format

      Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Value range: a valid SMALLDATETIME value For details, see Date and Time Processing Functions and Operators.

    • compatible_illegal_chars

      Specifies whether to tolerate invalid characters during data import. This syntax is available only for READ ONLY foreign tables.

      Value range: true/on and false/off The default value is false/off.

      • true/on: No error message is reported and data import is not interrupted when there are invalid characters. Invalid characters are converted into valid ones, and then imported to the database.
      • false/off: An error occurs when there are invalid characters, and the import stops.

      The rules for converting invalid characters are as follows:

      1. \0 is converted to a space.

      2. Other invalid characters are converted to question marks.

      3. When compatible_illegal_chars is set to true/on, after invalid characters such as NULL, DELIMITER, QUOTE, and ESCAPE are converted to spaces or question marks, an error message stating "illegal chars conversion may confuse COPY escape 0x20" will be displayed to remind you of possible parameter confusion caused by the conversion.

  • READ ONLY

    Specifies whether a foreign table is read-only. This parameter is available only for data import.

  • WRITE ONLY

    Specifies whether a foreign table is write-only. This parameter is available only for data import.

  • WITH error_table_name

    Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details. This parameter is available only after reject_limit is set.

    To be compatible with PostgreSQL open source interfaces, you are advised to replace this syntax with LOG INTO.

    Value range: a string. It must comply with the naming convention.

  • LOG INTO error_table_name

    Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details.

    This parameter is available only after PER NODE REJECT LIMIT is set.

    Value range: a string. It must comply with the naming convention.

  • REMOTE LOG 'name'

    Specifies that the data format error information is saved as files in GDS. name is the prefix of the error data file.

  • PER NODE REJECT LIMIT 'value'

    Specifies the maximum number of data format errors allowed on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.

    This syntax specifies the error tolerance of a single node.

    Data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data-format error occurs, the whole data import process is stopped.

    Value range: an integer or unlimited. The default value is 0, indicating that error information is returned immediately.

  • TO { GROUP groupname | NODE ( nodename [, ... ] ) }

    Currently, TO GROUP is not supported. TO NODE is used for internal scale-out tools.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- Create a foreign table to import data from GDS servers 192.168.0.90 and 192.168.0.91 in text format. Record errors that occur during data import to the err_HR_staffS table.
openGauss=# CREATE FOREIGN TABLE foreign_HR_staffS
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'TEXT', delimiter E'\x20',  null '') WITH err_HR_staffS;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- Create a foreign table to import data from GDS servers 192.168.0.90 and 192.168.0.91 in text format and record error messages in the import process to the err_HR_staffS table. A maximum of two data format errors are allowed during the import.
CREATE FOREIGN TABLE foreign_HR_staffS_ft3
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'TEXT', delimiter E'\x20',  null '',reject_limit '2') WITH err_HR_staffS_ft3;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- Create a foreign table to import all files in the input_data directory in CSV format.
openGauss=# CREATE FOREIGN TABLE foreign_HR_staffS_ft1
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID     NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'file:///input_data/*', format 'csv', mode 'private', delimiter ',') WITH err_HR_staffS_ft1;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- Create a foreign table to export data to the output_data directory in CSV format.
openGauss=# CREATE FOREIGN TABLE foreign_HR_staffS_ft2
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'file:///output_data/', format 'csv', delimiter '|', header 'on') WRITE ONLY;

1
2
3
4
5
-- Delete the foreign table.
openGauss=# DROP FOREIGN TABLE foreign_HR_staffS;
openGauss=# DROP FOREIGN TABLE foreign_HR_staffS_ft1;
openGauss=# DROP FOREIGN TABLE foreign_HR_staffS_ft2;
openGauss=# DROP FOREIGN TABLE foreign_HR_staffS_ft3;

Suggestions

  • delimiter
    • A delimiter cannot be \r or \n, or the same as the null value. The delimiter of CSV data cannot be same as the quote value.
    • The data length of a single row should be less than 1 GB. A row that has many columns using long delimiters cannot contain much valid data.
    • You are advised to use a multi-character, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.
  • quote
    • The value of quote cannot be the same as that of the delimiter or null parameter. The value must be a single-byte character.
    • Invisible characters are recommended, such as 0x07, 0x08, and 0x1b.
  • mode Normal
    • Supports all file types (including CSV, TEXT, and FIXED). To import data, you need to enable GDS on the data server.
  • mode Shared
    • Supports the TEXT format. It does not need assistance from GDS, but requires that all the user data be mounted to the same path of the same nodes through NFS.
  • mode Private
    • This mode is used when user data has been stored in the same local directories of DNs.