Help Center> MapReduce Service> Component Operation Guide> Using HDFS> Using Hadoop from Scratch

Using Hadoop from Scratch

You can use Hadoop to submit wordcount jobs. Wordcount is the most classic Hadoop job and is used to count the number of words in massive text.

Procedure

Prepare the wordcount program.

Multiple open source Hadoop sample programs are provided, including wordcount. You can download the Hadoop sample program from https://dist.apache.org/repos/dist/release/hadoop/common/.

For example, select hadoop-2.10.x, download hadoop-2.10.x.tar.gz, decompress it, and obtain hadoop-2.10.x\share\hadoop\mapreduce (the Hadoop sample program) from hadoop-mapreduce-examples-2.10.x.jar. The hadoop-mapreduce-examples-2.10.x.jar sample program contains the wordcount program.

hadoop-2.10.x indicates the Hadoop version.

Prepare data files.

There is no format requirement for data files. Prepare one or more .txt files. The following are examples of the .txt file:

qwsdfhoedfrffrofhuncckgktpmhutopmma
jjpsffjfjorgjgtyiuyjmhombmbogohoyhm
jhheyeombdhuaqqiquyebchdhmamdhdemmj
doeyhjwedcrfvtgbmojiyhhqssddddddfkf
kjhhjkehdeiyrudjhfhfhffooqweopuyyyy

Upload data to OBS.
1. Log in to OBS Console.
2. Click Parallel File System and choose Create Parallel File System to create a file system named wordcount01.
  wordcount01 is only an example. The file system name must be globally unique. Otherwise, the parallel file system fails to be created.
3. In the OBS file system list, click wordcount01 and choose Files > Create Folder to create the program and input folders, as shown in Figure 1.
  Figure 1 Folder list of the wordcount01 file system
  - program: stores user programs.
  - input: stores user data files.
4. Go to the program folder, choose Upload File > add file, select the program package downloaded in 1 from the local host, and click Upload. After the upload is complete, the page shown in Figure 2 is displayed.
  Figure 2 Program list
5. Go to the input folder and upload the data file prepared in 2 to the input folder. After the upload is complete, the page shown in Figure 3 is displayed.
  Figure 3 Data file list
Log in to the MRS console. In the navigation pane on the left, click Clusters and choose Active Clusters. Click the cluster name. The cluster must contain Hadoop components.
Submit the wordcount job.

On the MRS console, click the Jobs tab and click Create. The Create Job page is displayed. For details, see Running a MapReduce Job.

Figure 4 wordcount job
- Set Type to MapReduce.
- Set Name to mr_01.
- Set the path of the executable program to the address of the program stored on the OBS. For example: obs://wordcount01/program/hadoop-mapreduce-examples-2.10.x.jar
- Enter wordcount obs://wordcount01/input/ obs://wordcount01/output/ in the Parameter pane.
  - Replace the OBS file system name in obs://wordcount01/input/ with the actual name of the file system created in the environment.
  - Replace the OBS file system name in obs://wordcount01/output/ with the actual name of the file system created in the environment. Enter a directory that does not exist in the output directory.
- Service Parameter can be left blank.
A job can be submitted only when the cluster is in the Running state.

After a job is submitted successfully, it is in the Accepted state by default. You do not need to manually execute the job.
View the job execution result.
1. Go to the Jobs tab page and check whether the job is successfully executed.
  It takes some time to run the job. After the job is complete, refresh the job list to view the job execution, as shown in Figure 5.
  
  Figure 5 Job list
  
  Once a job has succeeded or failed, you cannot execute it again. However, you can add or copy a job, and set job parameters to submit a job again.
2. Log in to the OBS console, go to the OBS path, and view the job output information.
  You can view output files in the output directory created in 5. You need to download the file to the local host and open it in text format, as shown in Figure 6.
  
  Figure 6 Output file list

Parent topic: Using HDFS

Last Article: Using HDFS

Next Article: Configuring Memory Management

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Using Hadoop from Scratch

Procedure