Updated on 2024-07-19 GMT+08:00

Using Loader from Scratch

You can use Loader to import data from the SFTP server to HDFS.

This section applies to MRS clusters earlier than 3.x.

Prerequisites

  • You have prepared service data.
  • You have created an analysis cluster.

Procedure

  1. Access the Loader page.

    1. Go to the cluster details page and choose Services.
    2. Choose Hue. In Hue Web UI of Hue Summary, click Hue (Active). The Hue web UI is displayed.
    3. Choose Data Browsers > Sqoop.

      The job management tab page is displayed by default on the Loader page.

  2. On the Loader page, click Manage links.
  3. Click New link and create sftp-connector. For details, see File Server Link.
  4. Click New link, enter the link name, select hdfs-connector, and create hdfs-connector.
  5. On the Loader page, click Manage jobs.
  6. Click New Job.
  7. In Connection, set parameters.

    1. In Name, enter a job name.
    2. Select the source link created in 3 and the target link created in 4.

  8. In From, configure the job of the source link.

    For details, see ftp-connector or sftp-connector.

  9. In To, configure the job of the target link.

    For details, see hdfs-connector.

  10. In Task Config, set job running parameters.

    Table 1 Loader job running properties

    Parameter

    Description

    Extractors

    Number of Map tasks

    Loaders

    Number of Reduce tasks

    This parameter is displayed only when the destination field is HBase or Hive.

    Max. Error Records in a Single Shard

    Error record threshold. If the number of error records of a single Map task exceeds the threshold, the task automatically stops and the obtained data is not returned.

    NOTE:

    Data is read and written in batches for MYSQL and MPPDB of generic-jdbc-connector by default. Errors are recorded once at most for each batch of data.

    Dirty Data Directory

    Directory for saving dirty data. If you leave this parameter blank, dirty data will not be saved.

  11. Click Save.