Step 2: Creating a Foreign Table
- Connect to the GaussDB(DWS) database.
- Create a foreign table to store the three CSV data files that have been uploaded to OBS in Step 1: Uploading Data to OBS. This table is used to transfer data to ordinary tables in GaussDB(DWS).
The following foreign table and its parameter settings are provided as an example.
Parameters defined in a foreign table are used to identify data formats and set the error tolerance for data import. In this tutorial, only key parameter settings are provided. For more information, see CREATE FOREIGN TABLE (for OBS Import and Export).1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
DROP FOREIGN TABLE IF EXISTS product_info_ext; CREATE FOREIGN TABLE product_info_ext ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER gsmpp_server OPTIONS( LOCATION 'obs://mybucket/input_data/product_info | obs://mybucket02/input_data/product_info', FORMAT 'CSV' , DELIMITER ',', ENCODING 'utf8', HEADER 'false', ACCESS_KEY 'access_key_value_to_be_replaced', SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true' ) READ ONLY LOG INTO product_info_err PER NODE REJECT LIMIT 'unlimited';
If the following information is displayed, the foreign table has been created:1
CREATE FOREIGN TABLE
Parameter settings in the foreign table are as follows:
- SERVER: It is always set to gsmpp_server. You do not need to change it.
- LOCATION: The OBS path consists of obs://, a bucket name, and a file path. Example: obs://<bucket_name>/<file_path>.
Ensure no Chinese characters are contained in paths used for importing data to or exporting data from OBS.
Multiple paths (if any) are separated by vertical bars (|). For example:LOCATION 'obs://mybucket/input_data/product_info | obs://mybucket02/input_data/product_info',
The following two files having the specified prefix in the mybucket folder will be imported:mybucket/input_data/product_info0.csv mybucket/input_data/product_info1.csv
- FORMAT
It is set to CSV because data to be imported in this tutorial is in CSV format. If the data is in other formats, set this parameter accordingly.
- DELIMITER
It is set to , because the data in source data files is separated by commas (,).
- ENCODING
- HEADER
This parameter specifies whether a data file contains a header. It is valid only for data files in CSV format. The first line of data files in Preparing Data Files is not a header. Therefore, it is set to false.
- ACCESS_KEY and SECRET_ACCESS_KEY
These parameters specify the AK and SK used to access OBS by a user. Replace them with the actual AK and SK. To obtain access keys, log in to the management console, move the cursor over the username in the upper right corner, and click My Credentials. Then choose Access Keys in the navigation tree on the left. On the Access Keys page, you can view the existing AKs or click Create Access Key to create an AK/SK pair.
- FILL_MISSING_FIELDS
This parameter specifies how to handle the problem that the last column of a row in a source data file is lost during data import. The default value is false or off. This parameter is set to true in this tutorial.
- true/on: The last column is set to NULL. No error is reported.
- false/off: Error "missing data for column "tt"" is reported. For example, the last column product_comment_content of the second record in the product_info2.csv source data file is lost. If FILL_MISSING_FIELDS is set to false or off, information similar to the following will be displayed in the error table during data import:
missing data for column "product_comment_content"
- IGNORE_EXTRA_DATA
This parameter specifies whether to ignore excessive columns when the number of columns in a source data file exceeds that defined in the foreign table. The default value is false or off. This parameter is set to true in this tutorial.
- true/on: The excessive columns of a row are ignored. No error is reported.
- false/off: Error "extra data after last expected column" is reported.
For example, the number of columns in the third record in the product_info2.csv source data file is greater than that defined for the foreign table. If IGNORE_EXTRA_DATA is set to false or off, information similar to the following will be displayed in the error table during data import:
extra data after last expected column
- READ ONLY
Syntax defined in a foreign table can be used for both importing data to the GaussDB(DWS) cluster and for exporting data from the cluster. To import data to the cluster, use READ ONLY for the foreign table. To export data, use WRITE ONLY.
- LOG INTO
This parameter specifies the error table that records data format error information during import. You only need to specify the table name, and do not need to create it in advance. GaussDB(DWS) automatically creates the table when a foreign table is created and automatically deletes it when the foreign table is deleted.
- PER NODE REJECT LIMIT
This parameter specifies the allowed number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.
It is set to unlimited in this tutorial, indicating that all data format errors during import can be recorded.
Last Article: Step 1: Uploading Data to OBS
Next Article: Step 3: Importing Data
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.