Updated on 2023-04-19 GMT+08:00

Preparing Data on OBS

Scenarios

Before you use the SQL on OBS feature to query OBS data:

  1. You have stored the ORC data on OBS.

    For example, the ORC table has been created when you use the Hive or Spark component, and the ORC data has been stored on OBS.

    Assume that there are two ORC data files, named product_info.0 and product_info.1, whose original data is stored in the demo.db/product_info_orc/ directory of the mybucket OBS bucket. You can view their original data in Original Data.

  2. If your data files are already on OBS, perform steps in Obtaining the OBS Path of Original Data and Setting Read Permission.

    This section uses the ORC format as an example to describe how to import data. The method for importing PARQUET, CARBONDATA, and JSON data is similar.

    This method supports TEXT and CSV files, but does not support error tables. Therefore, Importing CSV and TEXT Data from OBS (Method 1) is recommended.

Original Data

Assume that you have stored the two ORC data files on OBS and their original data is as follows:

  • Data file product_info.0
  • Data file product_info.1

Obtaining the OBS Path of Original Data and Setting Read Permission

  1. Log in to the OBS management console.

    Click Service List and choose Object Storage Service to open the OBS management console.

  2. Obtain the OBS path for storing source data files.

    After the source data files are uploaded to an OBS bucket, a globally unique access path is generated. You need to specify the OBS paths of source data files when creating a foreign table.

    For details about how to view an OBS path, see Accessing an Object Using Its Object URL in the Object Storage Service Console Operation Guide.

    For example, the OBS paths are as follows:

    https://obs.ap-southeast-1.myhuaweicloud.com/mybucket/demo.db/product_info_orc/product_info.0
    https://obs.ap-southeast-1.myhuaweicloud.com/mybucket/demo.db/product_info_orc/product_info.1

  3. Grant the OBS bucket read permission for the user.

    The user who executes the SQL on OBS function needs to obtain the read permission on the OBS bucket where the source data file is located. You can configure the ACL for the OBS buckets to grant the read permission to a specific user.

    For details, see Configuring a Bucket ACL in the Object Storage Service Console Operation Guide.