更新时间:2022-08-16 GMT+08:00

上传数据到OBS

操作场景

从OBS导入数据到集群之前,需要提前准备数据源文件,并将数据源文件上传到OBS。如果您的数据文件已经在OBS上了,则只需完成上传数据到OBS中的2~3

准备数据文件

准备需要上传到OBS的数据源文件。GaussDB(DWS)只支持CSV、TEXT、ORC和CARBONDATA格式的数据源文件。

如果用户数据无法以CSV格式保存,可以选择以文本类型保存为其他任意格式后缀的文件。

根据导入数据原理,当数据源文件的数据量较大时,将数据文件存储到OBS前,尽可能均匀地将文件切分成多个,文件数量为DataNode的整数倍时,导入性能更好。

假设您已将3个CSV数据文件存储在OBS上,其原始数据分别如下:

  • 数据文件“product_info.0”

    示例数据如下所示:

    1
    2
    3
    4
    5
    100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good!
    205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good!
    300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad.
    310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice.
    150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite.
    
  • 数据文件“product_info.1”

    示例数据如下所示:

    1
    2
    3
    4
    5
    200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality.
    250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time.
    108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy.
    450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor.
    260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes.
    
  • 数据文件“product_info.2”

    示例数据如下所示:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    980,"ZKDS-J",2017-09-13,"B","2017 Women's Cotton Clothing","red","M",112,,,
    98,"FKQB-I",2017-09-15,"B","2017 new shoes men","red","M",4345,2017-09-18,5473
    50,"DMQY-K",2017-09-21,"A","2017 pants men","red","37",28,2017-09-25,58,"good","good","good"
    80,"GKLW-l",2017-09-22,"A","2017 Jeans Men","red","39",58,2017-09-25,72,"Very comfortable."
    30,"HWEC-L",2017-09-23,"A","2017 shoes women","red","M",403,2017-09-26,607,"good!"
    40,"IQPD-M",2017-09-24,"B","2017 new pants Women","red","M",35,2017-09-27,52,"very good."
    50,"LPEC-N",2017-09-25,"B","2017 dress Women","red","M",29,2017-09-28,47,"not good at all."
    60,"NQAB-O",2017-09-26,"B","2017 jacket women","red","S",69,2017-09-29,70,"It's beautiful."
    70,"HWNB-P",2017-09-27,"B","2017 jacket women","red","L",30,2017-09-30,55,"I like it so much"
    80,"JKHU-Q",2017-09-29,"C","2017 T-shirt","red","M",90,2017-10-02,82,"very good."
    

上传数据到OBS

  1. 上传数据到OBS。

    将待导入的数据源文件存储在OBS桶中。

    1. 登录OBS管理控制台。

      单击“服务列表”,选择“对象存储服务”,打开OBS管理控制台页面。

    2. 创建桶。

      如何创建OBS桶,具体请参见《对象存储服务用户指南》中的“控制台指南 > 管理桶 > 创建桶”章节。

      例如,创建以下两个桶:“mybucket”和“mybucket02”。

    3. 新建文件夹。

      具体请参见《对象存储服务用户指南》中的“控制台指南 > 管理对象 > 新建文件夹”章节。

      例如:

      • 在已创建的OBS桶“mybucket”中新建一个文件夹“input_data”。
      • 在已创建的OBS桶“mybucket02”中新建一个文件夹“input_data”。
    4. 上传文件。

      具体请参见《对象存储服务用户指南》中的“控制台指南 > 管理对象 >上传文件”章节。

      例如:

      • 将以下数据文件上传到OBS桶“mybucket”的“input_data”目录中。
        1
        2
        product_info.0
        product_info.1
        
      • 将以下数据文件上传到OBS桶“mybucket02”的“input_data”目录中。
        1
        product_info.2
        

  2. 获取数据源文件的OBS路径。

    数据源文件在上传到OBS桶之后,会生成全局唯一的访问路径。数据源文件的OBS路径用于创建外表时location参数设置。

    location参数中OBS文件的路径由“obs://”、桶名和文件路径组成,即为:obs://<bucket_name>/<file_path>/

    例如,在本例中,location参数中数据文件的OBS路径分别为:

    1
    2
    3
    obs://mybucket/input_data/product_info.0
    obs://mybucket/input_data/product_info.1
    obs://mybucket02/input_data/product_info.2
    

  3. 为导入用户设置OBS桶的读取权限。

    在从OBS导入数据到集群时,执行导入操作的用户需要取得数据源文件所在OBS桶的读取权限。通过配置桶的ACL权限,可以将读取权限授予指定的用户帐号。

    具体请参见《对象存储服务用户指南》中的“控制台指南 > 权限控制 > 配置桶ACL”章节。