Updated on 2024-11-29 GMT+08:00

Importing and Exporting Data

Through the Files tab page, you can create, delete, import, export, delete files in the analysis cluster. Currently, file creation is not supported. Streaming clusters do not support the file management function on the MRS GUI. In a cluster with Kerberos authentication enabled, to read or write the folders in the root directory, add a role that has the required permissions on the folders by referring to Managing Roles. Then, add the new role to the user group to which the user who submits the job belongs by referring to Creating a User.

Background

Data sources processed by MRS are from OBS or HDFS. OBS is an object-based storage service that provides you with massive, secure, reliable, and cost-effective data storage capabilities. MRS can process data in OBS directly. You can view, manage, and use data by using the web page of the management control platform or OBS client. In addition, you can use REST APIs independently or integrate APIs to service applications to manage and access data.

Before creating jobs, upload the local data to OBS for MRS to compute and analyze. MRS allows exporting data from OBS to HDFS for computing and analyzing. After the data analysis and computing are completed, you can store the data in HDFS or export them to OBS. HDFS and OBS can also store the compressed data in the format of bz2 or gz.

Importing Data

Currently, MRS can only import data from OBS to HDFS. The file upload rate decreases with the increase of the file size. This mode applies to scenarios where the data volume is small.

You can perform the following steps to import files and directories:

  1. Log in to the MRS console.
  2. Choose Clusters > Active Clusters and click the name of the cluster to be queried to enter the page displaying the cluster's information.
  3. Click the Files tab, and go to the file management page.
  4. Select HDFS File List.
  5. Go to the data storage directory, for example, bd_app1.

    The bd_app1 directory is only an example. You can use any directory on the page or create a new one.

    The requirements for creating a folder are as follows:

    • The folder name contains a maximum of 255 characters
    • The folder name cannot be empty.
    • The folder name cannot contain the following special characters: /:*?"<>|\;&,'`!{}[]$%+
    • The value cannot start or end with a period (.).
    • The spaces at the beginning and end are ignored.
  6. Click Import Data and configure the HDFS and OBS paths correctly. When configuring the OBS or HDFS path, click Browse, select a file directory, and click Yes.
    • OBS path
      • The path must start with obs://.
      • Files or programs encrypted by KMS cannot be imported.
      • An empty folder cannot be imported.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path
      • The path starts with /user by default.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\:
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.
  7. Click OK.

    You can view the file upload progress on the File Operation Records tab page. MRS processes the data import operation as a DistCp job. You can also check whether the DistCp job is successfully executed on the Jobs tab page.

Exporting Data

After the data analysis and computing are completed, you can store the data in HDFS or export them to OBS.

You can perform the following steps to export files and directories:

  1. Log in to the MRS console.
  2. Choose Clusters > Active Clusters and click the name of the cluster to be queried to enter the page displaying the cluster's basic information.
  3. Click the Files tab, and the file management page is displayed.
  4. Select HDFS File List.
  5. Go to the data storage directory, for example, bd_app1.
  6. Click Export Data and configure the OBS and HDFS paths. When configuring the OBS or HDFS path, click Browse, select a file directory, and click Yes.
    • OBS path
      • The path must start with obs://.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path
      • The path starts with /user by default.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\:
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.

    When a folder is exported to OBS, a label file named folder name_$folder$ is added to the OBS path. Ensure that the exported folder is not empty. If the exported folder is empty, OBS cannot display the folder and only generates a file named folder name_$folder$.

  7. Click OK.

    You can view the file upload progress on the File Operation Records tab page. MRS processes the data export operation as a DistCp job. You can also check whether the DistCp job is successfully executed on the Jobs tab page.

Viewing Operation Logs

When importing and exporting data on the MRS management console, you can choose Files > File Operation Records to view the data import and export progress.

Table 1 describes the parameters of the file operation record.

Table 1 File operation record parameters

Parameter

Description

Created

Time when the data import or export task is created.

Source Path

Source path of data.

  • OBS path during data import.
  • HDFS path during data export.

Target Path

Target path of data.

  • HDFS path during data import.
  • OBS path during data import.

Status

Status during data import or export.
  • Submitted
  • Accepted
  • Running
  • Completed
  • Terminated
  • Abnormal

Duration (min)

Time of data import or export.

The unit is minute.

Result

Result of data import or export.

  • Successful
  • Failed
  • Killed
  • Undefined

Operation

View Log: allows you to view file operation logs.