Exporting ORC Data from a GaussDB(DWS) Cluster to an MRS Cluster
GaussDB(DWS) allows you to export ORC data to MRS using an HDFS foreign table. You can specify the export mode and export data format in the foreign table. Data is exported from GaussDB(DWS) in parallel using multiple DNs and stored in HDFS. In this way, the overall export performance is improved.
Preparing the Environment
Create a GaussDB(DWS) cluster. Ensure that the MRS and GaussDB(DWS) clusters are in the same region, AZ, and VPC subnet and that the clusters can communicate with each other.
Creating an MRS Cluster
- Log in to the HUAWEI CLOUD console, choose Analytics > MapReduce Service and click Buy Cluster. Click the Custom Config tab, configure software parameters, and click Next.
Table 1 Software configuration Parameter
Example Value
Region
CN-Hong Kong
Cluster Name
mrs_01
Cluster Version
MRS 1.9.2 (recommended)
NOTE:- For clusters of version 8.1.1.300 and later, MRS clusters support versions 1.6.*, 1.7.*, 1.8.*, 1.9.*, 2.0.*, 3.0.*, 3.1.*, and later (* indicates a number).
- For clusters earlier than version 8.1.1.300, MRS clusters support versions 1.6.*, 1.7.*, 1.8.*, 1.9.*, and 2.0.* (* indicates a number).
Cluster Type
Analysis Cluster
- Configure hardware parameters and click Next.
Table 2 Hardware configuration Parameter
Example Value
Billing Mode
Pay-per-use
AZ
AZ2
VPC
vpc-01
Subnet
subnet-01
Security Group
Auto create
EIP
10.x.x.x
Enterprise Project
default
Master
2
Analysis Core
3
Analysis Task
0
- Configure the advanced settings based on the following table, click Buy Now, and wait for about 15 minutes for the cluster creation to complete.
Table 3 Advanced settings Parameter
Example Value
Tag
test01
Hostname Prefix
(Optional) Prefix for the name of an ECS or BMS in the cluster.
Auto Scaling
Retain the default value.
Bootstrap Action
Retain the default value. MRS 3.x does not support this parameter.
Agency
Retain the default value.
Data Disk Encryption
This function is disabled by default. Retain the default value.
Alarm
Retain the default value.
Rule Name
Retain the default value.
Topic Name
Select a topic.
Kerberos Authentication
This parameter is enabled by default.
User Name
admin
Password
This password is used to log in to the cluster management page.
Confirm Password
Enter the password of user admin again.
Login Mode
Password
User Name
root
Password
This password is used to remotely log in to the ECS.
Confirm Password
Enter the password of user root again.
Secure Communications
Select Enable.
Creating an MRS Cluster Connection
- Log in to the GaussDB(DWS) console and click the created data warehouse cluster. Ensure that the GaussDB(DWS) and MRS clusters are in the same region, AZ, and VPC subnet.
- Click the MRS Data Source tab and click Create MRS Cluster Connection.
- Select data source mrs_01 created in the previous step, enter the MRS account name admin and its password, and click OK.
Creating a Foreign Server
- Use Data Studio to connect to the created GaussDB(DWS) cluster.
- Create a user dbuser that has the permission for creating databases.
1
CREATE USER dbuser WITH CREATEDB PASSWORD 'password';
- Switch to user dbuser.
1
SET ROLE dbuser PASSWORD 'password';
- Create a database mydatabase.
1
CREATE DATABASE mydatabase;
- Perform the following steps to switch to database mydatabase:
- In the Object Browser window of the Data Studio client, right-click the database connection and choose Refresh from the shortcut menu. Then, the new database is displayed.
- Right-click the database name mydatabase and select Connect to DB from the shortcut menu.
- Right-click the database name mydatabase and select Open Terminal from the shortcut menu. The SQL command window for connecting to a database is displayed. Perform the following steps in the window.
- Grant the permission to create external servers to user dbuser. In 8.1.1 and later versions, you also need to grant the permission to use the public mode.
1 2
GRANT ALL ON FOREIGN DATA WRAPPER hdfs_fdw TO dbuser; In GRANT ALL ON SCHEMA public TO dbuser; //8.1.1 and later versions, common users do not have permission on the public mode and need to grant permission. In versions earlier than 8.1.1, you do not need to perform this operation.
The name of FOREIGN DATA WRAPPER must be hdfs_fdw. dbuser indicates the username of CREATE SERVER.
- Grant user dbuser the permission for using foreign tables.
1
ALTER USER dbuser USEFT;
- Switch to the Postgres database and query the foreign server automatically created by the system after the MRS data source is created.
1
SELECT * FROM pg_foreign_server;
Information similar to the following is displayed:
1 2 3 4 5 6
srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- gsmpp_server | 10 | 13673 | | | | gsmpp_errorinfo_server | 10 | 13678 | | | | hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (3 rows)
- Switch to database mydatabase and switch to user dbuser.
1
SET ROLE dbuser PASSWORD 'password';
- Create a foreign server.
The server name, address, and configuration path must be the same as those in 8.
1 2 3 4 5 6 7
CREATE SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca FOREIGN DATA WRAPPER HDFS_FDW OPTIONS ( address '192.168.1.245:9820,192.168.1.218:9820', //The intranet IP addresses of the active and standby master nodes on the MRS management plane, which can be used to communicate with GaussDB(DWS). hdfscfgpath '/MRS/8f79ada0-d998-4026-9020-80d6de2692ca', type 'hdfs' );
- View the foreign server.
1
SELECT * FROM pg_foreign_server WHERE srvname='hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca';
The server is successfully created if information similar to the following is displayed:
1 2 3 4
srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions --------------------------------------------------+----------+--------+---------+------------+--------+--------------------------------------------------------------------------------------------------------------------- hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca | 16476 | 13685 | | | | {"address=192.168.1.245:9820,192.168.1.218:9820",hdfscfgpath=/MRS/8f79ada0-d998-4026-9020-80d6de2692ca,type=hdfs} (1 row)
Creating a Foreign Table
Create an OBS foreign table that does not contain partition columns. The foreign server associated with the table is hdfs_server, the format of the file on HDFS corresponding to the table is ORC, and the data storage path on OBS is /user/hive/warehouse/product_info_orc/.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
DROP FOREIGN TABLE IF EXISTS product_info_output_ext; CREATE FOREIGN TABLE product_info_output_ext ( product_price integer , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt integer , product_comment_time date , product_comment_num integer , product_comment_content varchar(200) ) SERVER hdfs_server_8f79ada0_d998_4026_9020_80d6de2692ca OPTIONS ( format 'orc', foldername '/user/hive/warehouse/product_info_orc/', compression 'snappy', version '0.12' ) Write Only; |
Exporting Data
Create an ordinary table product_info_output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
DROP TABLE product_info_output; CREATE TABLE product_info_output ( product_price int , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) with (orientation = column,compression=middle) distribute by hash (product_name); |
1
|
INSERT INTO product_info_output_ext SELECT * FROM product_info_output; |
If the following information is displayed, the data is successfully exported:
1
|
INSERT 0 10 |
Viewing the Export Result
- Go to the MRS cluster list. Click a cluster name to go to the cluster details page.
- Click the Files tab and click HDFS File List. Check the exported ORC file in the user/hive/warehouse/product_info_orc directory.
ORC data exported from GaussDB(DWS) complies with the following rules:
- Data exported to MRS (HDFS): When data is exported from a DN, the data is stored in HDFS in the segment format. The file is named in the format of mpp_DatabaseName_SchemaName_TableName_NodeName_n.orc.
- You are advised to export data from different clusters or databases to different paths. The maximum size of an ORC file is 128 MB, and the maximum size of a stripe is 64 MB.
- After the export is complete, the _SUCCESS file is generated.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot