Tutorial: Importing Remote GaussDB(DWS) Data Sources
In the era of big data convergent analysis, GaussDB(DWS) clusters in the same region can communicate with each other. This practice demonstrates how to import data from a remote GaussDB(DWS) cluster to the local GaussDB(DWS) cluster using foreign tables.
The demonstration procedure is as follows: Install the gsql database client on an ECS, connect to GaussDB(DWS) using gsql, and import data from the remote GaussDB(DWS) using a foreign table.
General Procedure
This practice takes about 40 minutes. The basic process is as follows:
Preparations
You have registered a Huawei account and enabled Huawei Cloud. The account cannot be in arrears or frozen.
Creating an ECS
For details about how to purchase a Linux ECS, see section "Purchasing an ECS" in the Elastic Cloud Server Getting Started. After the purchase, log in to the ECS by referring to section "Logging In to a Linux ECS".
When creating an ECS, ensure that the ECS and the GaussDB(DWS) clusters to be created are in the same VPC subnet and in the same region and AZ . The ECS OS is the same as that of the gsql client or GDS (CentOS 7.6 is used as an example), and the password is used for login.
Creating a Cluster and Downloading the Tool Package
- Log in to the Huawei Cloud management console.
- Choose Service List > EI Enterprise IntelligenceAnalytics > Data Warehouse Service. On the page that is displayed, click Create Cluster in the upper right corner.
- Configure the parameters according to Table 1.
Table 1 Software configuration Parameter
Configuration
Region
Select EU-Dublin.
NOTE:- EU-Dublin is used as an example. You can select other regions as required. Ensure that all operations are performed in the same region.
- Ensure that GaussDB(DWS) and the ECS are in the same region, AZ, and VPC subnet.
AZ
AZ2
Resource
Standard data warehouse
Compute Resource
ECS
Storage Type
Cloud SSD
CPU Architecture
x86
Node Flavor
dws2.m6.4xlarge.8 (16 vCPUs | 128 GB | 2000 GB SSD)
NOTE:If this flavor is sold out, select other AZs or flavors.
Hot Storage
100 GB/node
Nodes
3
Cluster Name
dws-demo01
Administrator Account
dbadmin
Administrator Password
User-defined password
Confirm Password
password
Database Port
8000
VPC
vpc-default
Subnet
subnet-default(192.168.0.0/24)
NOTICE:Ensure that the cluster and the ECS are in the same VPC subnet.
Security Group
Automatic creation
EIP
Buy now
Bandwidth
1 Mbit/s
Advanced Settings
Default
- Confirm the information, click Next, and then click Submit.
- Wait for about 10 minutes. After the cluster is created, click the cluster name to go to the Basic Information page. Choose Network, click a security group name, and verify that a security group rule has been added. In this example, the client IP address is 192.168.0.x (the private network IP address of the ECS where gsql is located is 192.168.0.90). Therefore, you need to add a security group rule in which the IP address is 192.168.0.0/24 and port number is 8000.
- Return to the Basic Information tab of the cluster and record the value of Private Network IP Address.
- Return to the homepage of the GaussDB(DWS) console. Choose Connections in the navigation pane on the left, select the ECS OS (for example, select Redhat x86_64 for CentOS 7.6), and click Download to save the tool package to the local host. The tool package contains the gsql client and GDS.
- Repeat 1 to 6 to create a second GaussDB(DWS) cluster and set its name to dws-demo02.
Preparing Source Data
- Create the following three CSV files in the specified directory on the local PC:
- Data file product_info0.csv
1 2 3 4 5
100,XHDK-A,2017-09-01,A,2017 Shirt Women,red,M,328,2017-09-04,715,good! 205,KDKE-B,2017-09-01,A,2017 T-shirt Women,pink,L,584,2017-09-05,40,very good! 300,JODL-X,2017-09-01,A,2017 T-shirt men,red,XL,15,2017-09-03,502,Bad. 310,QQPX-R,2017-09-02,B,2017 jacket women,red,L,411,2017-09-05,436,It's nice. 150,ABEF-C,2017-09-03,B,2017 Jeans Women,blue,M,123,2017-09-06,120,good.
- Data file product_info1.csv
1 2 3 4 5
200,BCQP-E,2017-09-04,B,2017 casual pants men,black,L,997,2017-09-10,301,good quality. 250,EABE-D,2017-09-10,A,2017 dress women,black,S,841,2017-09-15,299,This dress fits well. 108,CDXK-F,2017-09-11,A,2017 dress women,red,M,85,2017-09-14,22,It's really amazing to buy. 450,MMCE-H,2017-09-11,A,2017 jacket women,white,M,114,2017-09-14,22,very good. 260,OCDA-G,2017-09-12,B,2017 woolen coat women,red,L,2004,2017-09-15,826,Very comfortable.
- Data file product_info2.csv
1 2 3 4 5 6 7 8 9 10
980,"ZKDS-J",2017-09-13,"B","2017 Women's Cotton Clothing","red","M",112,,, 98,"FKQB-I",2017-09-15,"B","2017 new shoes men","red","M",4345,2017-09-18,5473 50,"DMQY-K",2017-09-21,"A","2017 pants men","red","37",28,2017-09-25,58,"good","good","good" 80,"GKLW-l",2017-09-22,"A","2017 Jeans Men","red","39",58,2017-09-25,72,"Very comfortable." 30,"HWEC-L",2017-09-23,"A","2017 shoes women","red","M",403,2017-09-26,607,"good!" 40,"IQPD-M",2017-09-24,"B","2017 new pants Women","red","M",35,2017-09-27,52,"very good." 50,"LPEC-N",2017-09-25,"B","2017 dress Women","red","M",29,2017-09-28,47,"not good at all." 60,"NQAB-O",2017-09-26,"B","2017 jacket women","red","S",69,2017-09-29,70,"It's beautiful." 70,"HWNB-P",2017-09-27,"B","2017 jacket women","red","L",30,2017-09-30,55,"I like it so much" 80,"JKHU-Q",2017-09-29,"C","2017 T-shirt","red","M",90,2017-10-02,82,"very good."
- Data file product_info0.csv
- Log in to the created ECS as user root and run the following command to create a data source file directory:
mkdir -p /input_data
- Use a file transfer tool to upload the preceding data files to the /input_data directory of the ECS.
Importing Data Sources Using GDS
- Log in to the ECS as user root and use a file transfer tool to upload the downloaded tool package in 7 to the /opt directory.
- Decompress the tool package in the /opt directory.
cd /opt
unzip dws_client_8.1.x_redhat_x64.zip
- Create a GDS user and change the owners of the data source and GDS directories.
groupadd gdsgrp
useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /opt/gds
chown -R gds_user:gdsgrp /input_data
- Switch to user gds_user.
su - gds_user
- Import the GDS environment variables.
This step is required only for 8.1.x or later. For earlier versions, skip this step.
cd /opt/gds/bin
source gds_env
- Start GDS.
/opt/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 192.168.0.0/24 -l /opt/gds/gds_log.txt -D
- -d dir: directory for storing data files that contain data to be imported. This practice uses /input_data/ as an example.
- -p ip:port: listening IP address and port for GDS. Set this parameter to the private network IP address of the ECS where GDS is installed so that GDS can communicate with GaussDB(DWS). In this example, 192.168.0.90:5000 is used.
- -H address_string: hosts that are allowed to connect to and use GDS. The value must be in CIDR format. In this example, the network segment of the GaussDB(DWS) private network IP address is used.
- -l log_file: GDS log directory and log file name. In this example, /opt/gds/gds_log.txt is used.
- -D: GDS in daemon mode.
- Connect to the first GaussDB(DWS) cluster using gsql.
- Run the exit command to switch to user root, go to the /opt directory of the ECS, and import the environment variables of gsql.
exit
cd /opt
source gsql_env.sh
- Go to the /opt/bin directory and connect to the first GaussDB(DWS) cluster using gsql.
cd /opt/bin
gsql -d gaussdb -h 192.168.0.8 -p 8000 -U dbadmin -W password -r
- -d: name of the connected database. In this example, the default database gaussdb is used.
- -h: private network IP address of the connected GaussDB(DWS) database queried in 6. In this example, 192.168.0.8 is used.
- -p: GaussDB(DWS) port. The value is 8000.
- -U: database administrator. The value defaults to dbadmin.
- -W: administrator password, which is set during cluster creation in 3. In this example, replace password with your actual password.
- Run the exit command to switch to user root, go to the /opt directory of the ECS, and import the environment variables of gsql.
- Create a common user leo and grant the user the permission for creating foreign tables.
1 2
CREATE USER leo WITH PASSWORD 'password'; ALTER USER leo USEFT;
- Switch to user leo and create a GDS foreign table.
Set LOCATION to the GDS listening IP address and port number obtained in 6, for example, gsfs://192.168.0.90:5000/*.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
SET ROLE leo PASSWORD 'password'; DROP FOREIGN TABLE IF EXISTS product_info_ext; CREATE FOREIGN TABLE product_info_ext ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER gsmpp_server OPTIONS( LOCATION 'gsfs://192.168.0.90:5000/*', FORMAT 'CSV' , DELIMITER ',', ENCODING 'utf8', HEADER 'false', FILL_MISSING_FIELDS 'true', IGNORE_EXTRA_DATA 'true' ) READ ONLY LOG INTO product_info_err PER NODE REJECT LIMIT 'unlimited';
- Create a local table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DROP TABLE IF EXISTS product_info; CREATE TABLE product_info ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) WITH ( orientation = column, compression=middle ) DISTRIBUTE BY hash (product_id);
- Import data from the GDS foreign table and check whether the data is successfully imported.
1 2
INSERT INTO product_info SELECT * FROM product_info_ext ; SELECT count(*) FROM product_info;
Importing Remote GaussDB(DWS) Data Using a Foreign Table
- Connect to the second cluster on the ECS by referring to 7. Change the connection address to the address of the second cluster. In this example, 192.168.0.86 is used.
- Create a common user jim and grant the user the permission for creating foreign tables and servers. The value of FOREIGN DATA WRAPPER is gc_fdws.
1 2 3
CREATE USER jim WITH PASSWORD 'password'; ALTER USER jim USEFT; GRANT ALL ON FOREIGN DATA WRAPPER gc_fdw TO jim;
- Switch to user jim and create a server.
1 2 3 4 5 6 7
SET ROLE jim PASSWORD 'password'; CREATE SERVER server_remote FOREIGN DATA WRAPPER gc_fdw OPTIONS (address '192.168.0.8:8000,192.168.0.158:8000' , dbname 'gaussdb', username 'leo', password 'password' );
- address: private network IP addresses and port number of the first cluster obtained in 6. In this example, 192.168.0.8:8000 and 192.168.0.158:8000 are used.
- dbname: database name of the first connected cluster. In this example, gaussdb is used.
- username: username of the first connected cluster. In this example, leo is used.
- password: user password
- Create a foreign table.
The columns and constraints of the foreign table must be consistent with those of the table to be accessed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
CREATE FOREIGN TABLE region ( product_price integer , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER server_remote OPTIONS ( schema_name 'leo', table_name 'product_info', encoding 'utf8' );
- SERVER: name of the server created in the previous step. In this example, server_remote is used.
- schema_name: schema name of the first cluster to be accessed. In this example, leo is used.
- table_name: table name of the first cluster to be accessed obtained in 10. In this example, product_info is used.
- encoding: The value must be the same as that of the first cluster obtained in 9. In this example, utf8 is used.
- View the created server and foreign table.
1 2
\des+ server_remote \d+ region
- Create a local table.
The columns and constraints of the table must be consistent with those of the table to be accessed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
CREATE TABLE local_region ( product_price integer not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) WITH ( orientation = column, compression=middle ) DISTRIBUTE BY hash (product_id);
- Import data to the local table using the foreign table.
1 2
INSERT INTO local_region SELECT * FROM region; SELECT * FROM local_region;
- Query the foreign table without importing data.
1
SELECT * FROM region;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.