Help Center > > User Guide> MRS Cluster Component Operation Guide> Using Hadoop from Scratch

Using Hadoop from Scratch

Updated at:Sep 08, 2020 GMT+08:00

This section describes how to use Hadoop to submit a wordcount job. Wordcount, a typical Hadoop job, is used to count the words in texts.

Procedure

  1. Prepare the wordcount program.

    The open source Hadoop example program contains the wordcount program. You can download the Hadoop example program at https://dist.apache.org/repos/dist/release/hadoop/common/.

    For example, select a Hadoop version hadoop-2.7.x. Download hadoop-2.7.x.tar.gz, decompress it, and obtain hadoop-mapreduce-examples-2.7.x.jar from the hadoop-2.7.x\share\hadoop\mapreduce directory. The hadoop-mapreduce-examples-2.7.x.jar example program contains the wordcount program.

    hadoop-2.7.x indicates the Hadoop version.

  2. Prepare data files.

    There is no format requirement for data files. Prepare one or more TXT files. The following is an example of a TXT file:

    qwsdfhoedfrffrofhuncckgktpmhutopmma
    jjpsffjfjorgjgtyiuyjmhombmbogohoyhm
    jhheyeombdhuaqqiquyebchdhmamdhdemmj
    doeyhjwedcrfvtgbmojiyhhqssddddddfkf
    kjhhjkehdeiyrudjhfhfhffooqweopuyyyy

  3. Upload data to OBS.

    1. Log in to the OBS console.
    2. Click Create Bucket to create a bucket and name it. The name must be unique; otherwise the bucket cannot be created. Here bucket name wordcount01 will be used as an example.
    3. In the OBS bucket list, click wordcount01 and choose Objects > Create Folder to create the program and input folders.
      Figure 1 Folders in the wordcount01 bucket
      • program: stores user programs.
      • input: stores user data files.
    4. Go to the program folder, and choose Upload Object > Add File to select the program package downloaded in 1, and click Upload, as shown in Figure 2.
      Figure 2 Program list
    5. Go to the input folder and upload the data file that is prepared in 2, as shown in Figure 3.
      Figure 3 Input file list

  4. Log in to the MRS management console. In the navigation tree on the left, choose Clusters > Active Clusters and click the cluster named mrs_20160907. The mrs_20160907 cluster was created in section Custom Purchase of a Cluster.
  5. Submit a wordcount job.

    On the MRS management console, click the Jobs tab and click Create. The Create Job page is displayed, as shown in Running a MapReduce Job.

    Figure 4 Wordcount job
    • In Type, select MapReduce.
    • In Name, enter mr_01.
    • In Program Path, select an OBS path, for example, obs://wordcount01/program/hadoop-mapreduce-examples-2.7.x.jar.
    • In Parameters, enter the following parameters: wordcount obs://wordcount01/input/ obs://wordcount01/output/
      • The OBS bucket name in the obs://wordcount01/input/ parameter must be replaced with the name of the bucket you create.
      • The OBS bucket name in the obs://wordcount01/output/ parameter must be replaced with the name of the bucket you create. For output, enter a directory that does not exist.
    • You do not need to set Service Parameter.

    Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

    A job will be executed immediately after being created successfully.

  6. View the job execution results.

    1. Go to the Jobs tab page. On the Jobs tab page, check whether the jobs are complete.

      The job operation takes a while. After the jobs are complete, refresh the job list.

      Figure 5 Job list

      You cannot execute a successful or failed job, but can add or copy the job. After setting job parameters, you can submit the job again.

    2. Log in to the OBS console. Go to the OBS directory and query job output information.

      In the wordcount01 > output directory of OBS, you can query and download the job output files.

      Figure 6 Output file list

  7. Terminate a cluster.

    For details, see Terminating a Cluster.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel