CREATE FOREIGN TABLE (for GDS Import and Export)

Function

CREATE FOREIGN TABLE creates a GDS foreign table in the current database for concurrent data import and export. The GDS foreign table can be read-only or write-only, used for concurrent data import and export, respectively. The foreign table is read-only by default.

GDS is a command line tool running on the Linux OS. It works with the foreign table mechanism to quickly import and export data. The GDS tool package needs to be installed on the server where the data source file is located. This server is called the data server or the GDS server.

For details about how to install and configure GDS, see Installing, Configuring, and Starting GDS.

Precautions

Only the system administrator dbadmin or a common user who has been granted the USEFT permission can perform foreign table operations.
1

ALTER USER user_name USEFT;
The distribution mode of a GDS foreign table does not need to be explicitly specified. The default is ROUNDROBIN.
All constraints (including column and row constraints) are invalid to the GDS foreign table.
GDS supports the following file formats: TEXT, CSV, and FIXED.

Syntax

     CREATE FOREIGN TABLE [ IF NOT EXISTS  ] table_name 
    ( [  { column_name type_name POSITION(offset,length) | LIKE source_table } [, ...]  ] ) 
    SERVER gsmpp_server 
    OPTIONS (  { option_name ' value '  }  [, ...] ) 
    [  { WRITE ONLY  |  READ ONLY  }] 
    [ WITH error_table_name | LOG INTO error_table_name] 
    [REMOTE LOG 'name'] 
    [PER NODE REJECT LIMIT 'value']
    [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ];
 
 
  

Parameter Description

**Table 1** CREATE FOREIGN TABLE (for GDS import and export) parameters
Parameter	Description	Value Range
IF NOT EXISTS	Sends a notice, but does not throw an error, if a table with the same name exists.	-
table_name	Specifies the name of the foreign table.	A string, which must comply with the naming convention. For details, see Identifier Naming Conventions.
column_name	Specifies the name of a column in the foreign table.
type_name	Specifies the data type of the column.	-
POSITION(offset,length)	Defining the location of each column in the data file in fixed length mode. offset is the start of the column in the source file, and length is the length of the column.	offset must be greater than 0 bytes, and its unit is byte. The length of each record must be less than or equal to 1 GB. By default, columns not in the file are replaced with null.
SERVER gsmpp_server	Specifies the server name of the foreign table. For the GDS foreign table, its server is created by initial database, which is gsmpp_server.	-
OPTIONS	Specifies parameters of foreign table data.	Data format parameters. For details, see Table 2. Error-tolerance parameters. For details, see Table 3. Performance parameters. For details, see Table 4.
READ ONLY	Specifies whether a foreign table is read-only. This parameter is available only for data import.	-

**Table 2** OPTIONS parameters of the data format
Parameter	Description	Value Range
location	Specifies the data source location of the foreign table, which can be expressed through URLs. Separate URLs with vertical bars (\|). Currently, GDS can automatically create a directory defined by a foreign table during data export. If foreign table location specified as gsfs://192.168.0.91:5000/2019/09 for an export task, the 2019/09 subdirectory will be automatically created in the GDS data directory if it does not already exist. Manual creation of the directory specified in the foreign table is not necessary.	For a read-only foreign table imported by GDS from a remote server in parallel, its URL must end with its corresponding schema or file name. (Read-only is the default file attribute.) For example: gsfs://192.168.0.90:5000/* or file:///data/data.txt or gsfs:// 192.168.0.90:5000/* \| gsfs:// 192.168.0.91:5000/. For a writable foreign table used for GDS to export data to a remote server in parallel, file names are not required in URLs. If the data source location is a remote URL, for example, gsfs:// 192.168.0.90:5000/, multiple data sources can be specified. If the number of exported data file locations is less than or equal to the number of DNs, when you use the foreign table for export, data is evenly distributed to each data source location. If the number of exported data file locations is greater than the number of DNs, when you export data, the data is evenly distributed to data source locations corresponding to the DNs. Blank data files are created on the excess data source locations. For a foreign table used for GDS to import data from a remote server in parallel, the number of URLs must be less than the number of DNs, and URLs containing the same location cannot be used. If the URL begins with gsfss://, data is imported and exported in encryption mode, and DOP cannot exceed 10. During GDS export, the 2019/09* subdirectory in the gsfs://127.0.0.1:7789/2019/09/ directory specified by the location table is automatically created. If file_type is set to pipe, GDS determines whether the target file to be imported or exported is a pipe file or a directory based on whether the last character in the URL is a slash (/). Example: In gsfs://192.168.0.90:5000/a/b, GDS identifies b as a pipe file. In gsfs://192.168.0.90:5000/a/b/, GDS identifies b as a directory and creates a pipe file in the directory.
format	Specifies the format of the data source file in a foreign table.	Value range: CSV, TEXT, or FIXED. The default value is TEXT. In CSV files, escape sequences are processed as common strings. Therefore, linefeeds are processed as data. In TEXT files, escape sequences are processed as they are. Therefore, linefeeds are not processed as data. The FIXED file can process newline characters in data columns efficiently, but cannot process special characters very well. NOTE: An escape sequence is a string starting with a backslash (\), including \b (backspace), \f (formfeed page break), \n (new line), \r (carriage return), \t (horizontal tab), \v (vertical tab), \number (octal number), and \xnumber (hexadecimal number). In TEXT files, strings are processed as they are. In files of other formats, strings are processed as data. FIXED is defined as follows: (POSITION must be specified for each column when FIXED is used.) The column length of each record is the same. Spaces are used for column padding. Left padding is used for the numeric type and right padding is used for the char type. No delimiters are used between columns.
header	Specifies whether a data file contains a table header. header is available only for CSV and FIXED files. When data is imported, if header is on, the first row of the data file will be identified as title row and ignored. If header is off, the first row is identified as data. When data is exported, if header is on, fileheader must be specified. fileheader is used to specify the export header file format. If header is off, the exported file does not include a title row.	true/on or false/off. The default value is false or off.
fileheader	Specifies a file that defines the content in the header for exported data. The file contains one row of data description of each column. For example, to add a header in a file containing product information, define the file as follows: The information of products.\n	This parameter is available only when header is on or true. The file must be prepared in advance. In Remote mode, the definition file must be put to the working directory of GDS (the -d directory specified when starting the GDS). The definition file can contain only one row of title information, and end with a newline character. Excess rows will be discarded. (Title information cannot contain newline character). The length of the definition file including the newline character cannot exceed 1 MB.
out_filename_prefix	Specifies the name prefix of the exported data file exported using GDS from a write-only foreign table. If file_type is set to pipe, the pipe file dbName_schemaName_foreignTableName.pipe is generated. If both out_filename_prefix and location specify a pipe name, the pipe name specified in location is used.	The prefix of the specified file name must be valid and compliant with the restrictions of the file system in the physical environment where the GDS is deployed. Otherwise, the file will fail to be created. The file name prefix can contain only lowercase letters, uppercase letters, digits, and underscores (_). The prefix of the specified export file name cannot contain feature fields reserved for the Windows and Linux OS, including but not limited to: "con","aux","nul","prn","com0","com1","com2","com3","com4","com5","com6","com7","com8","com9","lpt0","lpt1","lpt2","lpt3","lpt4","lpt5","lpt6","lpt7","lpt8","lpt9" The total length of the absolute path consisting of the exported file prefix, the path specified by GDS –d, .dat, or .pipe should be as required by the file system where GDS is deployed. It is required that the prefix can be correctly parsed and identified by the receiver (including but not limited to the original database where it was exported) of the data file. Identify and modify the option that causes the file name resolution problem (if any). To concurrently perform export jobs, do not use the same file name prefix for them. Otherwise, the exported files may overwrite each other or be lost in the OS or file system.
delimiter	Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,). No delimiter is used in FIXED format.	The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes. A delimiter cannot be \r or \n. A delimiter cannot be the same as the null value. The delimiter of CSV cannot be same as the quote value. The delimiter for the TEXT format data cannot contain any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789. The data length of a single row should be less than 1 GB. If the delimiters are too long and there are too many rows, the length of valid data will be affected. You are advised to use a multi-character, such as the combination of the dollar sign ($), caret (^), the ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter. For a multi-character delimiter, do not use the same characters, for example, ---.
quote	Specifies which characters in a CSV source data file will be identified as quotation marks. The default value is a double quotation mark (").	The quote parameter cannot be the same as the delimiter or null parameter. The quote parameter must be a single-byte character. Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.
escape	Specifies which characters in a CSV source data file are escape characters. Escape characters can only be single-byte characters. Default value: the same as the value of QUOTE	-
null	Specifies the string that represents a null value.	The default value is \N for the TEXT format. The default value for the CSV format is an empty string without quotation marks. The null value cannot be \r or \n. The maximum length is 100 characters. The null value cannot be the same as the delimiter or quote parameter.
blank_number_str_to_nul	Converts a variable number of spaces of the numeric type (INT1OID, INT2OID, INT4OID, INT8OID, FLOAT4OID, or FLOAT8OID) in a data file to NULL. This parameter is supported only by clusters of version 8.1.3, 8.2.1.236, 9.1.0, or later. For example, if this option is set to true, NULL is returned when null '', null ' ' (one space), or null ' ' (two spaces) is defined.	The value can be true or false. The default value is false, which means no conversion is done.
noescaping	Specifies in TEXT format, whether to escape the backslash (\) and its following characters.	noescaping is available only for the TEXT format. true/on or false/off. The default value is false or off.
encoding	Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database. Before you import foreign tables, it is recommended that you set client_encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before you import foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expected. If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.	Currently, GDS cannot parse or write in a file using multiple encoding formats during foreign table import or export. Common encoding formats include UTF8, GBK, and GB18030. GB18030 has two versions: GB18030 and GB18030_2022. GB18030_2022 is the latest national standard in China prepared to support Chinese characters.
dataencoding	Specifies the actual encoding of data in tables in the latin1 database. Converts data format during GDS data import and export. This parameter is supported by version 8.2.0 or later clusters.	GBK, GB18030, GB18030_2022, and UTF8. Typical scenarios: To import the GB-encoded text data to the latin1 database and convert the data to UTF8. Set dataencoding to UTF8 and encoding to GB. To export GB-encoded data from the latin1 database as a UTF8 data file, set dataencoding to GB and encoding to UTF8. CAUTION: This parameter can be used only in latin1 databases. When this parameter is used, encoding must be GBK, GB18030, GB18030_2022, or UTF8.
mode	Specifies the data import policy during a specific data import process. DWS supports only the Normal mode.	Valid value: Normal (default): supports all file types (CSV, TEXT, FIXED). Enabling Gauss data service to help data import.
eol	Specifies the newline character style of the imported or exported data file.	Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #. The eol parameter supports only the TEXT format for data import and export and does not support the CSV or FIXED format for data import. For forward compatibility, the eol parameter can be set to 0x0D or 0x0D0A for data export in the CSV and FIXED formats. If you want to export data in FIXED format as a single row, use the eol parameter and set it to an empty string ''. However, it is important to note that this method of exporting data cannot be directly imported into the GaussDB(DWS) database. The value of the eol parameter cannot be the same as that of DELIMITER or NULL. The value of the eol parameter cannot contain digits, letters, or periods (.).
conflict_delimiter	This parameter is generally used with the compatible_illegal_chars parameter. If a data file contains a truncated Chinese character, the truncated character and a delimiter will be encoded into another Chinese character due to inconsistent encoding between the foreign table and the database. As a result, the delimiter is masked and an error will be reported, indicating that there are missing fields. This parameter is used to avoid encoding a truncated character and a delimiter into another character.	Value range: true, on, false, and off. The default value is false or off. If the parameter is set to true or on, encoding a truncated character and a delimiter into another character is allowed. If the parameter is set to false or off, encoding a truncated character and a delimiter into another character is not allowed. This parameter is disabled by default. It is recommended that you disable this parameter, because encoding a truncated character and a delimiter into another character is rarely required. If the parameter is enabled, the scenario may be incorrectly identified and thereby causing incorrect information imported to the table.
file_type	Specifies the type of the file to be imported or exported.	Value options: normal, pipe, and interconn. normal is the default value. If this parameter is set to normal, the file to be imported or exported is a common file. If this parameter is set to pipe, the file to be imported or exported is a named pipe. If the parameter is interconn, the current GDS foreign table is used for interconnected and compressed import and export. This parameter must be used together with the gds_compress parameter and is supported only by cluster 8.2.0 or later. CAUTION: If the value of file_type is set to interconn, GDS must be upgraded to 8.2.0 or later. Otherwise, the error message "ERROR: un-support format" will be displayed.
auto_create_pipe	Specifies whether the GDS process automatically creates a named pipe. When configuring auto_create_pipe, you need to set file_type to pipe or interconn. Otherwise, the foreign table cannot be created.	Value options: true, on, false, and off. The default value is true/on. If this parameter is set to true or on, the GDS process is allowed to automatically create a named pipe. If this parameter is set to false or off, you need to manually create a named pipe. If auto_create_pipe is set to false and no pipe is specified during data import and export, the database name_schema name_foreign table name.pipe file will be opened. If a pipe has been specified, the specified pipe in the location will be opened. If the named pipe is not written by other programs or is not opened in write mode within the period specified by the pipe-timeout parameter, an error message is displayed indicating that the import or export task times out. If the file is not a pipe, an error is reported when the import or export task is executed. If auto_create_pipe is set to true and no pipe file is specified during data import and export, the database name_schema name_foreign table name.pipe file will be opened. If the file is a common file, an error is reported when the file is imported or exported. If the file is a pipe, the system automatically deletes the file and re-creates the named pipe. You can use the location parameter to specify the pipe during data export, for example, location'gsfs://127.0.0.1:7789/aa.pipe. When auto_create_pipe is set to true, GDS automatically creates the aa.pipe file in the data directory.
del_pipe	Specifies whether to automatically delete the pipe file after the import or export task is complete. When configuring del_pipe, you need to set file_type to pipe or interconn. Otherwise, the foreign table cannot be created.	Value options: true or on; false or off. The default value is true or on. If this parameter is set to true or on, the GDS process will automatically delete a named pipe file. If this parameter is set to false or off, the GDS process will not delete a named pipe file.
gds_compress	This parameter is used to reduce the network bandwidth for cluster interconnection when users use GDS pipe files. Currently, and only the Snappy algorithm is supported. This parameter is supported by version 8.2.0 or later clusters.	Value range: snappy. Currently, only the Snappy algorithm is supported, which means that data imported and exported through GDS pipes is compressed and decompressed using the Snappy algorithm. When setting this parameter, file_type must be set to interconn and format to CSV. If this parameter is not set, by default, files are not compressed. Currently, this parameter can only be set to snappy. When using this function, ensure that the versions of the local cluster, remote cluster, and GDS are of the same version. If the value of file_type is interconn, GDS must be upgraded to 8.2.0 or later. Otherwise, the error message "ERROR: un-support format" is displayed. This parameter reduces the network bandwidth between clusters and increases the cluster CPU load. You can determine whether to enable this parameter based on the site requirements. When a GDS foreign table is used for interconnected data migration, to allow data to be compressed during migration, the GDS foreign table must support this parameter. You use interconnected foreign tables to compress and migrate data, or manually use a GDS foreign table to compress and migrate data at the network layer. Many parameters need to be set during manual procedure, you are advised to use interconnected foreign tables. This parameter must be specified for foreign tables used for both export and import.
preserve_blanks	Specifies whether to retain the blank characters (including spaces, \t, \v, and \f) at the end of each column during fixed-length import. This parameter is supported by version 8.2.0.100 or later clusters.	Value range: true, on, false, and off. The default value is false or off. If this parameter is set to true or on, the trailing \t is retained and pruning is not performed for column data of the BYTEAOID, CHAROID, NAMEOID, TEXTOID, BPCHAROID, VARCHAROID, NVARCHAR2OID or CSTRINGOID type. If this parameter is set to false or off, blank characters (including spaces, \t, \v, and \f) at the end of all data types are pruned.
fix	Specifies the length of fixed format data. The unit is byte. This syntax is available only for READ ONLY foreign tables.	Value range: Less than 1 GB, and greater than or equal to the total length specified by POSITION (The total length is the sum of offset and length in the last column of the table definition.)
out_fix_alignment	Specifies how the columns of types BYTEAOID, CHAROID, NAMEOID, TEXTOID, BPCHAROID, VARCHAROID, NVARCHAR2OID, and CSTRINGOID are aligned during fixed-length export.	Value range: align_left, align_right Default value: align_right CAUTION: The bytea data type must be in hexadecimal format (for example, \XXXX) or octal format (for example, \XXX\XXX\XXX). The data to be imported must be left-aligned (that is, the column data starts with either of the two formats instead of spaces). Therefore, if the exported file needs to be imported using a GDS foreign table and the file data length is less than that specified by the foreign table formatter, the exported file must be left aligned. Otherwise, an error is reported during the import.
out_fix_num_alignment	Specifies how the columns of numeric types TINYINT, SMALLINT, INTEGER, BINARY_INTEGER, BIGINT, NUMERIC[(p[,s])], DECIMAL[(p[,s])], NUMBER[(p[,s])], REAL, FLOAT4, DOUBLE PRECISION, FLOAT8, FLOAT[(p)], BINARY_DOUBLE, DEC[(p[,s])], INTEGER[(p[,s])], SMALLSERIAL, SERIAL and BIGSERIAL are aligned during fixed-length export. This is supported only by clusters of version 9.1.0 or later.	Value range: align_left, align_right Default value: align_left
date_format	Imports data of the DATE type. This syntax is available only for READ ONLY foreign tables.	Value range: any valid DATE value. For details, see Date and Time Processing Functions and Operators. CAUTION: If ORACLE is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.
time_format	Imports data of the TIME type. This syntax is available only for READ ONLY foreign tables.	Value range: any valid TIME value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
timestamp_format	Imports data of the TIMESTAMP type. This syntax is available only for READ ONLY foreign tables.	Value range: any valid TIMESTAMP value. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.
smalldatetime_format	Imports data of the SMALLDATETIME type. This syntax is available only for READ ONLY foreign tables.	Value range: any valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.

**Table 3** OPTIONS fault tolerance parameters
Parameter	Description	Value Range
fill_missing_fields	Specifies whether to generate an error message when the last column in a row in the source file is lost during data import.	Value range: true, on, false, and off. The default value is false or off. If this parameter is set to true or on and the last column of a data row in a data source file is lost, the column will be replaced with NULL and no error message will be generated. If this parameter is set to false or off and the last column is missing, the following error information will be displayed: missing data for column "tt"
ignore_extra_data	Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is available during data import.	Value range: true, on, false, and off. The default value is false or off. If this parameter is set to true or on and the number of data source files exceeds the number of foreign table columns, excessive columns will be ignored. If this parameter is set to false or off and the number of data source files exceeds the number of foreign table columns, the following error information will be displayed: extra data after last expected column CAUTION: If the newline character at the end of the row is lost, setting the parameter to true will ignore data in the next row.
reject_limit	Specifies the maximum number of data format errors allowed during a data import task. If the number of errors does not reach the maximum number, the data import task can still be executed. NOTE: You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'. Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data format error occurs, the whole data import process is stopped.	Value range: a positive integer or unlimited Enclose positive integer values with single quotation marks (''). If this parameter is not specified, an error message is returned immediately.
compatible_illegal_chars	Specifies whether to tolerate invalid characters during data import and export. This syntax is valid for READ ONLY and WRITE ONLY foreign tables. Only clusters of version 8.1.3.331 or later support export fault tolerance.	Value range: true, on, false, and off. The default value is false or off. If the parameter is true or on, the database will accept invalid characters after converting them during import or export. If this parameter is set to false or off, the import process will be stopped if there are invalid characters and an error occurs. The rule of error tolerance when you import invalid characters is as follows: \0 is converted to a space. Other illegal characters are converted to question marks. If compatible_illegal_chars is set to true or on, the database will convert and accept the invalid characters. If NULL, DELIMITER, QUOTE, and ESCAPE are set to a spaces or question marks. Errors like "illegal chars conversion may confuse COPY escape 0x20" will be displayed to prompt user to modify parameter values that cause confusion, preventing import errors. Enabling error tolerance for foreign table export will result in invalid characters being exported as question marks (?), which can lead to inconsistencies between the exported and original data when imported back into the GaussDB(DWS) database.
replace_illegal_chars	Error tolerance parameter for importing invalid characters. This parameter is used to replace '\0' with a specified string. This syntax is available only for READ ONLY foreign tables.	The restrictions on the string used to replace '\0' are as follows: compatible_illegal_chars must be set to true or on. format must be set to TEXT or CSV. A maximum of 10 characters is allowed. The string cannot be the same as the delimiter, or contain the delimiter, or contain the following characters: digits, letters, backslashes (\), and periods (.).
WITH error_table_name	Data format errors during import are recorded in the table specified by error_table_name. You can query this table after the import to obtain error details. This parameter is available only after reject_limit is set. NOTE: To be compatible with PostgreSQL open source interfaces, you are advised to replace this syntax with LOG INTO.	A string, which must comply with the naming convention.
LOG INTO error_table_name	Data format errors during import are recorded in the table specified by error_table_name. You can query this table after the import to obtain error details. NOTE: This parameter is available only after PER NODE REJECT LIMIT is set.	A string, which must comply with the naming convention.
REMOTE LOG 'name'	The data format error information is saved as files in GDS. name is the prefix of the error data file.	-
PER NODE REJECT LIMIT 'value'	This parameter specifies the allowed number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import. CAUTION: This syntax specifies the error tolerance of a single node. Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data scanning process is stopped.	An unlimited integer. If this parameter is not specified, an error message is returned immediately.

**Table 4** OPTIONS performance parameters
Parameter	Description	Value Range
file_sequence	Concurrently imports data in parallel through GDS foreign tables, to improve single-file import performance. This parameter is only used for data import. The parameter format is file_sequence'total number of shards-current shard'. Example: file_sequence '3-1' indicates that the imported file is logically split into three shards and the data currently imported by the foreign table is the data on the first shard. file_sequence '3-2' indicates that the imported file is logically split into three shards and the data currently imported by the foreign table is the data on the second shard. file_sequence '3-3' indicates that the imported file is logically split into three shards and the data currently imported by the foreign table is the data on the third shard.	This parameter has the following constraints: A file can be split to a maximum of 8 shards. The number of currently imported shard should be less than or equal to the total number of split shards. Only CSV and TXT files can be imported. When data is imported in parallel in CSV format, some shards fail to be imported in some scenarios because the CSV rules conflict with the GDS splitting logic. For details, see Exception Description When the file_sequence Parameter Is Used.

Examples

Create a foreign table customer_ft to import data from GDS server 10.10.123.234 in TEXT format:

     CREATE FOREIGN TABLE customer_ft
(
    c_customer_sk             integer               ,
    c_customer_id             char(16)              ,
    c_current_cdemo_sk        integer               ,
    c_current_hdemo_sk        integer               ,
    c_current_addr_sk         integer               ,
    c_first_shipto_date_sk    integer               ,
    c_first_sales_date_sk     integer               ,
    c_salutation              char(10)              ,
    c_first_name              char(20)              ,
    c_last_name               char(30)              ,
    c_preferred_cust_flag     char(1)               ,
    c_birth_day               integer               ,
    c_birth_month             integer               ,
    c_birth_year              integer                       ,
    c_birth_country           varchar(20)                   ,
    c_login                   char(13)                      ,
    c_email_address           char(50)                      ,
    c_last_review_date        char(10)
)
    SERVER gsmpp_server
    OPTIONS
(
    location 'gsfs://10.10.123.234:5000/customer1*.dat',
    FORMAT 'TEXT' ,
    DELIMITER '|',
    encoding 'utf8',
    mode 'Normal')
READ ONLY;
 
 
  

Create a foreign table to import data from GDS servers 192.168.0.90 and 192.168.0.91 in TEXT format. Record errors that occur during data import in foreign_HR_staffS_ft. A maximum of two data format errors are allowed during the data import.

     CREATE FOREIGN TABLE foreign_HR_staffS_ft
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'TEXT', delimiter E'\x08',  null '',reject_limit '2') WITH err_HR_staffS_ft;
 
 
  

Create a foreign table to import all files in the input_data directory in CSV format.

     CREATE FOREIGN TABLE foreign_HR_staffS_ft1
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID     NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'file:///input_data/*', format 'csv', quote E'\x08', mode 'private', delimiter ',') WITH err_HR_staffS_ft1;
 
 
  

Create a foreign table to export data to the output_data directory in CSV format.

     CREATE FOREIGN TABLE foreign_HR_staffS_ft2
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'file:///output_data/', format 'csv', quote E'\x08', delimiter '|', header 'on') WRITE ONLY;
 
 
  

Exception Description When the file_sequence Parameter Is Used

When data is imported in parallel in CSV format, some shards fail to be imported in the following scenario because the CSV rules conflict with the GDS splitting logic:

Scenario: A CSV file contains a newline character that is not escaped, the newline character is contained in the character specified by quote, and the data of this line is in the first row of the logical shard.

Example: If you import the big.csv file in parallel, the following information is displayed:

--id, username, address
10001,"customer1 name","Rose District"
10002,"customer2 name","
23 Road Rose 
District NewCity"
10003,"customer3 name","NewCity"

After the file is split into two shards, the content of the first shard is as follows:

10001,"customer1 name","Rose District"
10002,"customer2 name","
23

The content of the second shard is as follows:

Road Rose 
District NewCity"
10003,"customer3 name","NewCity"

The newline character after the first line of the second shard is contained between double quotation marks. As a result, GDS cannot determine whether the newline character is a newline character in the field or a separator in the line. Therefore, two data records on the first shard are successfully imported, but the second shard fails to be imported.

Helpful Links

ALTER FOREIGN TABLE (GDS Import and Export), DROP FOREIGN TABLE

Installing, Configuring, and Starting GDS

Best Practices: Using GDS to Import Table Data from a Remote Server to a GaussDB(DWS) Cluster

Troubleshooting: Data Import and Export Faults with GDS

Parent Topic: DDL Syntax

Previous topic: CREATE EXTERNAL SCHEMA

Next topic: CREATE FOREIGN TABLE (SQL on OBS or Hadoop)

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.