Help Center > > User Guide> Managing an Existing Cluster> Managing Data Files

Managing Data Files

Updated at: Mar 25, 2021 GMT+08:00

Through the Files tab page, you can create, delete, import, export, delete files in the analysis cluster. Currently, file creation is not supported. Streaming clusters do not support the file management function on the MRS GUI. In a cluster with Kerberos authentication enabled, the permissions on folders in the root directory are restricted. To read and write these folders, add a role that has permissions on the folders by referring to Creating a Role. Then, change the user group to which the user who submits the job belongs and add the new role to the user group by referring to Related Tasks.

Background

Data sources processed by MRS are from OBS or HDFS. OBS is an object-based storage service that provides you with massive, secure, reliable, and cost-effective data storage capabilities. MRS can process data in OBS directly. You can view, manage, and use data by using the web page of the management control platform or OBS client. In addition, you can use REST APIs independently or integrate APIs to service applications to manage and access data.

Before creating jobs, upload the local data to OBS for MRS to compute and analyze. MRS allows exporting data from OBS to HDFS for computing and analyzing. After the data analysis and computing are completed, you can store the data in HDFS or export them to OBS. HDFS and OBS can also store the compressed data in the format of bz2 or gz.

Importing Data

Currently, MRS can only import data from OBS to HDFS. The file upload rate decreases with the increase of the file size. This mode applies to scenarios where the data volume is small.

You can perform the following steps to import files and directories:

  1. Log in to the MRS management console.
  2. Choose Clusters > Active Clusters and click the name of the cluster to be queried to enter the page displaying the cluster's information.
  3. Click the Files tab, go to the file management page.
  4. Select HDFS File List.
  5. Go to the data storage directory, for example, bd_app1.

    The bd_app1 directory is only an example. You can use any directory on the page or create a new one.

    The requirements for creating a folder are as follows:

    • The folder name contains a maximum of 255 characters
    • The folder name cannot be empty.
    • The folder name cannot contain the following special characters: /:*?"<>|\;&,'`!{}[]$%+
    • The value cannot start or end with a period (.).
    • The spaces at the beginning and end are ignored.
  6. Click Import Data and configure the HDFS and OBS paths correctly. When configuring the OBS or HDFS path, click Browse, select a file directory, and click Yes.
    Figure 1 Importing data
    • OBS path
      • The path must start with obs://. For clusters of versions earlier than MRS 1.8.10, the path must start with s3a://.
      • Files or programs encrypted by KMS cannot be imported.
      • An empty folder cannot be imported.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path
      • The path starts with /user by default.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.
  7. Click OK.

    You can view the file upload progress on the File Operation Records tab page. MRS processes the data import operation as a DistCp job. You can also check whether the DistCp job is successfully executed on the Jobs tab page.

Exporting Data

After the data analysis and computing are completed, you can store the data in HDFS or export them to OBS.

You can perform the following steps to export files and directories:

  1. Log in to the MRS management console.
  2. Choose Clusters > Active Clusters and click the name of the cluster to be queried to enter the page displaying the cluster's basic information.
  3. Click the Files tab, and the file management page is displayed.
  4. Select HDFS File List.
  5. Go to the data storage directory, for example, bd_app1.
  6. Click Export Data and configure the OBS and HDFS paths. When configuring the OBS or HDFS path, click Browse, select a file directory, and click Yes.
    Figure 2 Exporting data
    • OBS path
      • The path must start with obs://. For clusters of versions earlier than MRS 1.8.10, the path must start with s3a://.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The OBS full path contains a maximum of 255 characters.
    • HDFS path
      • The path starts with /user by default.
      • The directory and file name can contain letters, digits, hyphens (-), and underscores (_), but cannot contain the following special characters: ;|&>,<'$*?\
      • The directory and file name cannot start or end with a space, but can contain spaces between them.
      • The HDFS full path contains a maximum of 255 characters.

    When a folder is exported to OBS, a label file named folder name_$folder$ is added to the OBS path. Ensure that the exported folder is not empty. If the exported folder is empty, OBS cannot display the folder and only generates a file named folder name_$folder$.

  7. Click OK.

    You can view the file upload progress on the File Operation Records tab page. MRS processes the data export operation as a DistCp job. You can also check whether the DistCp job is successfully executed on the Jobs tab page.

Viewing Operation Logs

When importing and exporting data on the MRS management console, you can choose Files > File Operation Records to view the data import and export progress.

Table 1 describes the parameters of the file operation record.

Table 1 File operation record parameters

Parameter

Description

Submitted

Start time of data import or export.

Source Path

Source path of data.

  • OBS path during data import.
  • HDFS path during data export.

Target Path

Target path of data.

  • HDFS path during data import.
  • OBS path during data import.

Status

Status during data import or export.
  • Running
  • Completed
  • Terminated
  • Abnormal

Duration (min)

Time of data import or export.

The unit is minute.

Result

Result of data import or export.

  • Successful
  • Failed

Operation

View Log: allows you to view file operation logs.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel