Updated on 2023-10-23 GMT+08:00

Creating a GDS Foreign Table

Procedure

  1. Set the location parameter for the foreign table based on the path planned in Planning Data Export.

    • Remote mode
      Set the location parameter to the URL of the directory that stores the data files.
      • You do not need to specify a file name in the URL.
      • If multiple URLs are configured, only the first URL is effective.

      Example:

      The IP address of the GDS data server is 192.168.0.90. The listening port number set during GDS startup is 5000. The directory for storing data files is /output_data.

      In this case, set the location parameter to gsfs://192.168.0.90:5000/.

    • Local mode

      Set the location parameter to the directory for storing the data files. You do not need to specify the file name.

      Example:

      To store data files in the /output_data/ directory, set the location parameter to file:///output_data/ when you create a foreign table.

  2. Set data format parameters in the foreign table based on the planned data file formats. For details on format parameters, see data format parameters.
  3. Create a GDS foreign table based on the parameter settings in the preceding steps. For details, see CREATE FOREIGN TABLE (for Import and Export).

Examples

  • Example 1: Create the GDS foreign table foreign_tpcds_reasons to receive data from a data server. Data will be exported in CSV format.

    Data export settings are as follows:

    The data server resides on the same intranet as the cluster. The IP address of the data server is 192.168.0.90. Data is to be exported as CSV files. The Remote mode is selected for parallel data export.

    If the directory for storing data files is /output_data/ and the GDS listening port is 5000 when GDS is started, set the location parameter to gsfs://192.168.0.90:5000/.

    Data format parameter settings are as follows:

    • format is set to CSV.
    • encoding is set to UTF-8.
    • delimiter is set to E'\x20'.
    • quote is set to 0x1b.
    • null is set to an empty string without quotation marks.
    • escape is set to a double quotation mark.
    • header is set to false, indicating that the first row is identified as a data row in an exported file.
    • EOL is set to 0X0A.
    The statement for creating the foreign table is as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    openGauss=# CREATE FOREIGN TABLE foreign_tpcds_reasons
    (
      r_reason_sk    integer        not null,
      r_reason_id    char(16)       not null,
      r_reason_desc  char(100)
    ) 
    SERVER gsmpp_server 
    OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', 
    FORMAT 'CSV',
    DELIMITER E'\x20',
    QUOTE E'\x1b', 
    NULL '', 
    EOL '0x0a'
    )
    WRITE ONLY;
    
  • Example 2: The GDS foreign table foreign_tpcds_reasons is created for receiving data from a data server. Data will be exported as FIXED files.

    Data export settings are as follows:

    The data server resides on the same intranet as the cluster. The IP address of the data server is 192.168.0.90. Data is to be exported as CSV files. The Remote mode is selected for parallel data export.

    If the directory for storing data files is /output_data/ and the GDS listening port is 5000 when GDS is started, set the location parameter to gsfs://192.168.0.90:5000/.

    Data format parameter settings are as follows:

    • format is set to FIXED.
    • encoding is set to UTF-8.
    • header is set to false, indicating that when the file is exported, the first row is regarded as a data row.
    • POSITION(offset,length) is set to specify the position of a column in a source data file. offset indicates the starting point of a column in the data file. length indicates the length (in bytes) of the column.

      In the r_reason_sk column, the data type is integer and the maximum number of bytes is 2 (as shown in the following statement output). Therefore, offset is set to 1 and length is set to 2.

      In the r_reason_id column, the data type is character varying(16) and the maximum number of bytes is 16 (as shown in the following statement output). Therefore, offset is set to 3 (sum of offset and length of r_reason_sk) and length is set to 16.

      In the r_reason_desc column, the data type is character varying(100) and the maximum number of bytes is 100 (as shown in the following statement output). Therefore, offset is set to 19 (sum of offset and length of r_reason_id) and length is set to 100.

      1
      2
      3
      4
      5
      openGauss=# SELECT max(lengthb(r_reason_sk)),max(lengthb(r_reason_id)),max(lengthb(r_reason_desc)) FROM reasons;
       max | max | max 
      -----+-----+-----
         2 |  16 | 100
      (1 row)
      
    • EOL is set to 0X0A.
    The statement for creating the foreign table is as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    openGauss=# CREATE FOREIGN TABLE foreign_tpcds_reasons
    ( r_reason_sk    integer        position(1,2),
      r_reason_id    char(16)       position(3,16),
      r_reason_desc  char(100)      position(19,100)
    ) 
    SERVER gsmpp_server 
    OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', 
    FORMAT 'FIXED', 
    ENCODING 'utf8',
    EOL '0x0a'
    )
    WRITE ONLY;