Help Center/ MapReduce Service/ User Guide/ Submitting an MRS Job/ Uploading Application Data to an MRS Cluster
Updated on 2024-09-23 GMT+08:00

Uploading Application Data to an MRS Cluster

MRS clusters generally process data from the OBS file system or the HDFS file system in the cluster. OBS provides you with the data storage capabilities that are massive, secure, reliable, and cost-effective.

You can access, manage, and use OBS data on the MRS console and OBS client. You can also import OBS data to the HDFS system of a cluster for processing. Note that the file upload rate may decrease as the file size increases and this method more suitable for scenarios with smaller amounts of data.

Importing OBS Data to HDFS

  1. Log in to the MRS console.
  2. On the Active Clusters page displayed by default, click the name of the target cluster to enter the cluster details page.

    Complete IAM user synchronization first for MRS clusters with Kerberos authentication enabled. (On the Dashboard page of the cluster details page, click Synchronize on the right side of IAM User Sync to synchronize IAM users.)

  3. Click Files to go to the file management page.
  4. Select HDFS File List.

    Figure 1 HDFS file list

  5. Go to the directory where the data to be imported is stored.

    Click Create to create a folder directory or select an existing folder in HDFS.

  6. Click Import Data and configure the HDFS and OBS paths correctly.

    When configuring the OBS or HDFS path, click Browse, select a file directory, and click OK.
    Figure 2 Importing data
    • OBS path description:
      • The path must start with obs://.
      • Files or programs encrypted by KMS cannot be imported.
      • An empty folder cannot be imported.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain special characters ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path description:
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\:
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.

  7. Click OK.

    You can view the file upload progress on the File Operation Records page. The system generates a DistCp job for processing the data import operation. You can also view the job execution status on the Job Management page.

Exporting HDFS Data to OBS

  1. Log in to the MRS console.
  2. On the Active Clusters page displayed by default, click the name of the target cluster to enter the cluster details page.
  3. Click the Files tab to go to the file management page.
  4. Select HDFS File List.
  5. Go to the data storage directory.
  6. Click Export Data and configure the OBS and HDFS paths. When configuring the OBS or HDFS path, click Browse, select a file directory, and click OK.

    Figure 3 Exporting data

    When a folder is exported to OBS, a label file named folder name_$folder$ is added to the OBS path. Ensure that the exported folder is not empty. If the exported folder is empty, OBS cannot display the folder and only generates a file named folder name_$folder$.

  7. Click OK.

    You can view the file upload progress on the File Operation Records page. The system generates a DistCp job for processing the data import operation. You can also view the job execution status on the Job Management page.