Preparing Data in an MRS Cluster
Before importing data from MRS to a GaussDB(DWS) cluster, you must have:
- Created an MRS cluster.
- Created the Hive/Spark ORC table in the MRS cluster and stored the table data to the HDFS path corresponding to the table.
If you have completed the preparations, skip this section.
In this tutorial, the Hive ORC table will be created in the MRS cluster as an example to complete the preparation work. The process for creating the Spark ORC table in the MRS cluster and the SQL syntax are similar to those of Hive.
The sample data of the product_info.txt data file is as follows:
100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good! 300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice 150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite 200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality. 250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective 200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear 300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good 100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good
Creating a Hive ORC Table in an MRS Cluster
- Create an MRS cluster.
For details, see Creating an MRS Data Source Connection in the Data Warehouse Service Management Guide.
- Log in to the Hive client of the MRS cluster.
- Log in to a Master node.
For details, see Logging In to an ECS in the MapReduce Service User Guide.
- Run the following command to switch the user:
sudo su - omm
- Run the following command to go to the client directory:
- Run the following command to configure the environment variables:
- If Kerberos authentication has been enabled for the current cluster, run the following command to authenticate the current user, which requires that the current user has the permission to create Hive tables. Next, configure a role with the corresponding permissions. For details, see Creating a Role in the MapReduce Service User Guide. Then, bind a role to a user. For details, see Creating a User in the MapReduce Service User Guide. If the Kerberos authentication is disabled for the current cluster, skip this step.
kinit MRS cluster user
Example: kinit hiveuser
- Run the following command to start the Hive client:
- Log in to a Master node.
- Create a database demo on Hive.
CREATE DATABASE demo;
- Create table product_info of the Hive TEXTFILE type in the database demo and import the Data File (product_info.txt) to the HDFS path corresponding to the table.
Run the following command to switch to the database demo:
Run the following command to create table product_info and define the table fields based on data in the Data File.
DROP TABLE product_info; CREATE TABLE product_info ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as TEXTFILE;
For details about how to import data to an MRS cluster, see Managing Active Clusters > Managing Data Files in the MapReduce Service User Guide.
- Create a Hive ORC table named product_info_orc in the database demo.
Run the following command to create the Hive ORC table product_info_orc. The table fields are the same as those of the product_info table created in the previous step.
DROP TABLE product_info_orc; CREATE TABLE product_info_orc ( product_price int not null, product_id char(30) not null, product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc;
- Insert data in the product_info table to the Hive ORC table product_info_orc.
insert into product_info_orc select * from product_info;
Query table product_info_orc.
select * from product_info_orc;
If data displayed in the Data File can be queried, the data has been successfully inserted to the ORC table.