Help Center> MapReduce Service> Getting Started> Using Hadoop from Scratch
None

Getting Started with Hadoop

MapReduce Service (MRS) provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.
This document describes how to use Hadoop to submit a wordcount job in normal and security clusters from scratch. A wordcount job is the most classic Hadoop job that counts words in massive amounts of text.

Step 1: Buy a Cluster

① Go to the Buy Cluster page.
② Click the Custom Config tab on the cluster purchase page.

1

MapReduce Service

Apply for a VPC.

2

Buy a Cluster

Apply for an ECS.

View Image

Step 2: Configure Software

 

① In Region, select a desired region.

② In Billing Mode, select Pay-per-use.
③ In Cluster Name, enter mrs_demo or specify a name according to naming rules.

④ In Cluster Type, select Analysis cluster.

⑤ In Version Type, select Normal.
⑥ In Cluster Version, select MRS 3.1.0.
⑦ Select all components of an streaming cluster. Use the default values for other parameters.

⑧ Click Next.

1

Configure Software - 01

Select the charging mode.

2

Configure Software - 02

View Image

Step 3: Configure Hardware

① In AZ, select AZ2.

② In Enterprise Project, select default.

③ Use the default values for VPC and Subnet, or click View VPC to create a VPC.
④ In Security Group, use the default value Auto create.
⑤ In EIP, use the default value Bind later.
⑥ In Cluster Node, use the default values of instance specifications for Master and Core nodes. Use the default values for the node count as well as data disk type and size. Do not add Task nodes.
⑦ Click Next.

1

Configure Hardware - 01

Obtain the instance's connection address.

2

Configure Hardware - 02

Download and install a client.

View Image

Step 4: Set Advanced Options

 Kerberos Authentication: Disable Kerberos authentication.

② Username: name of the Manager administrator. admin is used by default.

③ Set Password and Confirm Password to the password of the Manager administrator.

④ Set Login Mode to Password, and enter the password and confirm password for user root.

⑤ Retain the default value of Hostname Prefix.

⑥ Select Set Advanced Options and set Agency to MRS_ECS_DEFAULT_AGENCY.

⑦ Click Next.

1

Set Advanced Options

Obtain the instance's connection address.

View Image

Step 5: Confirm Configuration

① Configure displays the configuration information of the purchased cluster.

 Secure Communications: Select Enable

③ Click Buy Now. The page is displayed showing that the task has been submitted.
④ Click Back to Cluster List. You can view the status of the cluster on the Active Clusters page. It takes some time to create a cluster. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.

1

Confirm Configuration

Obtain the instance's connection address.

View Image

Step 6: Prepare the Hadoop Sample Program and Data Files

① Prepare the wordcount program.
Download the Hadoop sample program (including wordcount).
 For example, select hadoop-3.3.1.tar.gz, decompress it, and obtain the Hadoop sample program hadoop-mapreduce-examples-3.3.1.jar in the hadoop-3.3.1\share\hadoop\mapreduce directory.
② Prepare data files.
There is no format requirement for data files. Prepare two .txt files.
In this example, files wordcount1.txt and wordcount2.txt are used.

1

Sample Program

Obtain the instance's connection address.

2

Sample Files

Download and install a client.

View Image

Step 7: Upload Data to OBS

① Log in to the OBS console and choose Parallel File Systems. On the Parallel File Systems page, click Create Parallel File System. On the Create Parallel File System page that is displayed, configure parameters to create a file system named mrs-word01.
②Click the name of the mrs-word01 file system. In the navigation pane on the left, choose Files. On the page that is displayed, click Create Folder to create the program and input folders.
③ Go to the program folder, and upload the Hadoop sample program downloaded in Step 6.
④ Go to the input folder, and upload the wordcount1.txt and wordcount2.txt data files prepared in Step 6.
⑤ To submit a job on the GUI, go to Step 8.
To submit a job through a node at the cluster background, go to Step 9.

1

Create an OBS Bucket and Folders

Obtain the instance's connection address.

2

Upload Data

Download and install a client.

View Image

Step 8: Submit a Job on the GUI

① In the left navigation pane of the MRS management console, choose Clusters > Active Clusters. Click the mrs_demo cluster name.
② On the cluster details page, click the Jobs tab and then click Create. The Create Job page is displayed. To submit a job through a node at the cluster background, refer to Step 9.
③ In Type, select MapReduce.
④ In Name, enter wordcount.
⑤ In Program Path, click OBS and select the Hadoop sample program uploaded in Step 7.
⑥ In Parameters, enter wordcount obs://mrs-word01/input/ obs://mrs-word01/output/. output is an output path. Enter a directory that does not exist.
⑦ Leave Service Parameter blank.
⑧ Click OK to submit the job. If the job has been successfully submitted, its status is Running by default. You do not need to manually execute the job.
⑨ Go to the Jobs tab page, view the job status and logs, and go to Step 10 to view the job execution result.

1

Create a Job

Obtain the instance's connection address.

View Image

Step 9: Submit a Job Through a Node at the Cluster Background

① Log in to the MRS management console, and click the name of the cluster created in Step 2. The basic cluster information page is displayed.
② On the Nodes tab page, click the name of a Master node to go to the ECS management console.
③ Click Remote Login in the upper right corner of the page.
④ Enter the username and password of the Master node as prompted. The username is root and the password is the one set during cluster creation.
⑤ Run the source /opt/client/bigdata_env command to configure environment variables.
⑥ For a security cluster, run the kinit MRS cluster username command, for example, kinit admin, to authenticate the current user of the cluster.
⑦ Run the following command to copy the sample program in the OBS bucket to the Master node in the cluster:
hadoop fs -Dfs.s3a.access.key=AK -Dfs.s3a.secret.key=SK -copyToLocal source_path.jar target_path.jar
Example: hadoop fs -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX -copyToLocal "s3a://mrs-word/program/hadoop-mapreduce-examples-XXX.jar" "/home/omm/hadoop-mapreduce-examples-XXX.jar"
To obtain the AK/SK pair for logging in to OBS Console, hover the cursor over the username in the upper right corner of the management console, and choose My Credentials > Access Keys.
⑧ Run the following command to submit a wordcount job. If data needs to be read from OBS or outputted to OBS, add the AK/SK parameters.
source /opt/client/bigdata_env;hadoop jar execute_jar wordcount input_path output_path
Example: source /opt/client/bigdata_env;hadoop jar /home/omm/hadoop-mapreduce-examples-XXX.jar wordcount -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX "s3a://mrs-word/input/*" "s3a://mrs-word/output/"
In the preceding command, input_path indicates a path for storing job input files on OBS. output_path indicates a path for storing job output files on OBS. Set it to a directory that does not exist.

1

Log In to the Master Node

Obtain the instance's connection address.

View Image

Step 10: View the Job Execution Results

Log in to OBS Console. Go to the output path in the mrs-word01 Parallel File Systems specified during job submission, and view the job output file. You need to download the file to the local host and open it in TXT format.

1

View the Job Execution Result

Obtain the instance's connection address.

View Image