Updated on 2022-09-14 GMT+08:00

Uploading Data

On the Files page, you can create and delete HDFS directories, as well as import, export, and delete files in an analysis cluster.

For clusters with Kerberos authentication enabled, synchronize IAM users before performing operations on the Files page. On the cluster details page, click Dashboard and click Synchronize on the right of IAM User Sync to synchronize IAM users.

Background

MRS clusters generally process data from OBS or HDFS. OBS provides you with the data storage capabilities that are massive, secure, reliable, and cost-effective. MRS can directly process data in OBS. You can browse, manage, and use data both on the management console and on the OBS Client. If you need to import OBS data into the HDFS system of the cluster for processing, perform the steps in this section.

Importing Data

Currently, MRS can import data from OBS to the HDFS. The file upload rate decreases with the increase of the file size. This mode applies to scenarios where the data volume is small.

You can perform the following steps to import files and directories:

  1. Log in to the MRS console.
  2. Choose Clusters > Active Clusters, and click the name of the target cluster to enter the cluster details page.
  3. Click Files to go to the file management page.
  4. Select HDFS File List.

  5. Go to the data storage directory, for example, bd_app1.

    The bd_app1 directory is only an example. You can use any directory on the page or create a new one.

    The requirements for creating a folder are as follows:

    • The folder name contains a maximum of 255 characters.
    • The folder name cannot be empty.
    • The folder name cannot contain the following special characters: /:*?"<>|\;&,'`!{}[]$%+
    • The value cannot start or end with a period (.).
    • The spaces at the beginning and end are ignored.
  6. Click Import Data and configure the HDFS and OBS paths correctly. When configuring the OBS or HDFS path, click Browse, select a file directory, and click Yes.
    Figure 1 Importing data
    • OBS path
      • The path must start with obs://.
      • Files or programs encrypted by KMS cannot be imported.
      • An empty folder cannot be imported.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain special characters ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path
      • The path starts with /user by default.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\:
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.
  7. Click OK.

    You can view the file upload progress on the File Operation Records page. MRS processes the data import operation as a DistCp job. You can also check whether the DistCp job is successfully executed on the Jobs page.

Exporting Data

After data analysis and computing is complete, you can store the data in the HDFS or export it to OBS.

You can perform the following steps to export files and directories:

  1. Log in to the MRS console.
  2. Choose Clusters > Active Clusters, and click the name of the target cluster to enter the cluster details page.
  3. Click Files to go to the file management page.
  4. Select HDFS File List.
  5. Go to the data storage directory, for example, bd_app1.
  6. Click Export Data and configure the OBS and HDFS paths. When configuring the OBS or HDFS path, click Browse, select a file directory, and click Yes.
    Figure 2 Exporting data
    • OBS path
      • The path must start with obs://.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain special characters ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path
      • The path starts with /user by default.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\:
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.

    When a folder is exported to OBS, a label file named folder name_$folder$ is added to the OBS path. Ensure that the exported folder is not empty. If the exported folder is empty, OBS cannot display the folder and only generates a file named folder name_$folder$.

  7. Click OK.

    You can view the file upload progress on the File Operation Records page. MRS processes the data export operation as a DistCp job. You can also check whether the DistCp job is successfully executed on the Jobs page.