更新时间:2022-07-29 GMT+08:00
MRS集群上的数据准备
从MRS导入数据到GaussDB(DWS)集群之前,假设您已经完成了以下准备工作:
- 已创建MRS集群。
- 在MRS集群上创建了Hive/Spark ORC表,且表数据已经存储到该表对应的HDFS路径上。
如果您已经完成上述准备,则可以跳过本章节。
为方便起见,我们将以在MRS集群上创建Hive ORC表作为示例,完成上述准备工作。在MRS集群上创建Spark ORC表的大致流程和SQL语法,同Hive类似,在本文中不再展开描述。
数据文件
假设有数据文件product_info.txt,示例数据如下所示:
100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good 205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good! 300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad. 310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice 150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite 200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality. 250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time. 108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy 450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor 260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes 980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small 98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter. 150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective 200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear 300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good 100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good. 350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good 110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good. 230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good
在MRS集群上创建Hive ORC表
- 创建了MRS集群。
- 下载客户端。
- 登录MRS集群的Hive客户端。
- 登录Master节点。
- 执行以下命令切换用户。
sudo su - omm
- 执行以下命令切换到客户端目录:
cd /opt/client
- 执行以下命令配置环境变量:
source bigdata_env
- 如果当前集群已启用Kerberos认证,执行以下命令认证当前用户,当前用户需要具有创建Hive表的权限,具体操作请参见《MapReduce服务用户指南》的“创建角色”章节。配置拥有对应权限的角色,具体操作请参见《MapReduce服务用户指南》的“创建角色”章节。为用户绑定对应角色。如果当前集群未启用Kerberos认证,则无需执行此命令。
kinit MRS集群用户
例如,kinit hiveuser。
- 执行以下命令启动Hive客户端:
beeline
- 在Hive中创建数据库demo。
CREATE DATABASE demo;
- 在数据库demo中新建了一个Hive TEXTFILE类型的表product_info,并将数据文件(product_info.txt)导入到该表对应的HDFS路径中。
执行以下命令切换到demo数据库:
USE demo;
执行以下命令,创建表product_info,表字段按照数据文件中的数据进行定义:
DROP TABLE product_info; CREATE TABLE product_info ( product_price int , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as TEXTFILE;
有关导入数据到MRS集群的操作,请参见《MapReduce服务用户指南》中的“集群操作指导 > 管理现有集群 > 管理数据文件”章节。
- 在数据库demo中创建了一个Hive ORC表product_info_orc。
执行以下命令,创建Hive ORC表product_info_orc,表字段与上一步创建的表product_info完全一致:
DROP TABLE product_info_orc; CREATE TABLE product_info_orc ( product_price int , product_id char(30) , product_time date , product_level char(10) , product_name varchar(200) , product_type1 varchar(20) , product_type2 char(10) , product_monthly_sales_cnt int , product_comment_time date , product_comment_num int , product_comment_content varchar(200) ) row format delimited fields terminated by ',' stored as orc;
- 将product_info表的数据插入到Hive ORC表product_info_orc中。
insert into product_info_orc select * from product_info;
查询表product_info_orc:
select * from product_info_orc;
如果查询到如数据文件所示的数据,表示已经成功将数据插入到ORC表。
父主题: 从MRS导入数据到集群