文档首页 > > 开发指南> 导入数据> 从MRS导入数据到集群> MRS集群上的数据准备

MRS集群上的数据准备

分享
更新时间: 2019/07/22 15:44

从MRS导入数据到DWS集群之前,假设您已经完成了以下准备工作:

  1. 已创建MRS集群。
  2. 在MRS集群上创建了Hive/Spark ORC表,且表数据已经存储到该表对应的HDFS路径上。

如果您已经完成上述准备,则可以跳过本章节。

为方便起见,我们将以在MRS集群上创建Hive ORC表作为示例,完成上述准备工作。在MRS集群上创建Spark ORC表的大致流程和SQL语法,同Hive类似,在本文中不再展开描述。

数据文件

假设有数据文件product_info.txt,示例数据如下所示:

100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good
205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good!
300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad.
310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice
150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite
200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality.
250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time.
108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy
450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor
260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes
980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small
98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter.
150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective
200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear
300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good
100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good.
350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good
110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 
210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good.
230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good

在MRS集群上创建Hive ORC表

  1. 创建了MRS集群。

    具体操作请参见《数据仓库服务管理指南》的创建MRS数据源连接

  2. 登录MRS集群的Hive客户端。
    1. 登录Master节点。

      具体操作,请参见《MapReduce服务用户指南》中的登录集群节点章节。

    2. 执行以下命令切换用户。
      sudo su - omm
    3. 执行以下命令切换到客户端目录:
      cd /opt/client
    4. 执行以下命令配置环境变量:
      source bigdata_env
    5. 执行以下命令启动Hive客户端:
      beeline
  3. 在Hive中创建数据库demo。

    执行以下命令创建数据库:

    CREATE DATABASE demo;
  4. 在数据库demo中新建了一个Hive TEXTFILE类型的表product_info,并将数据文件(product_info.txt)导入到该表对应的HDFS路径中。

    执行以下命令切换到demo数据库:

    USE demo;

    执行以下命令,创建表product_info,表字段按照数据文件中的数据进行定义:

    DROP TABLE product_info;
    
    CREATE TABLE product_info 
    (    
        product_price                int            not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    int            ,
        product_comment_time         date           ,
        product_comment_num          int        ,
        product_comment_content      varchar(200)                   
    ) 
    row format delimited fields terminated by ',' 
    stored as TEXTFILE;

    有关导入数据到MRS集群的操作,请参见《MapReduce服务用户指南》中的管理数据文件章节。

  5. 在数据库demo中创建了一个Hive ORC表product_info_orc。

    执行以下命令,创建Hive ORC表product_info_orc,表字段与上一步创建的表product_info完全一致:

    DROP TABLE product_info_orc;
    
    CREATE TABLE product_info_orc
    (    
        product_price                int            not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    int            ,
        product_comment_time         date           ,
        product_comment_num          int            ,
        product_comment_content      varchar(200)                   
    ) 
    row format delimited fields terminated by ',' 
    stored as orc;
  6. 将product_info表的数据插入到Hive ORC表product_info_orc中。
    insert into product_info_orc select * from product_info;

    查询表product_info_orc:

    select * from product_info_orc;

    如果查询到如数据文件所示的数据,表示已经成功将数据插入到ORC表。

分享:

    相关文档

    相关产品

文档是否有解决您的问题?

提交成功!

非常感谢您的反馈,我们会继续努力做到更好!

反馈提交失败,请稍后再试!

*必选

请至少选择或填写一项反馈信息

字符长度不能超过200

提交反馈 取消

如您有其它疑问,您也可以通过华为云社区问答频道来与我们联系探讨

跳转到云社区