Getting Started with Hadoop
MapReduce Service (MRS) provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.
This document describes how to use Hadoop to submit a wordcount job in normal and security clusters from scratch. A wordcount job is the most classic Hadoop job that counts words in massive amounts of text.
Buy a cluster. -> Prepare the Hadoop sample program and data files. -> Upload data to OBS. -> Create a job. -> View the job execution results.
Step 1: Buy a Cluster
① Log in to the HUAWEI CLOUD management console.
② Choose EI Enterprise Intelligence > MapReduce Service. The MRS management console is displayed.
③ Click Buy Cluster. The Buy Cluster page is displayed.
④ Click the Custom Config tab on the cluster purchase page.
Buy a Cluster
Step 2: Configure Software
① In Region, select a desired region.
② In Cluster Name, enter mrs_demo or specify a name according to naming rules.
③ In Cluster Version, select MRS 2.1.0.
④ In Cluster Type, select Analysis cluster.
⑤ Select all components of an analysis cluster. Use the default values for other parameters.
⑥ Disable Kerberos Authentication.
⑦ In Username, use the default value admin.
⑧ In Password, enter the password of the MRS Manager administrator.
⑨ Click Next.
Configure Software - 01
Configure Software - 02
Configure Software - 03
Step 3: Configure Hardware
① In Billing Mode, select Pay-per-use.
② In AZ, select AZ2.
③ Use the default values for VPC and Subnet, or click View VPC to create a VPC.
④ In Security Group, use the default value Auto create.
⑤ In EIP, use the default value Bind later.
⑥ In Enterprise Project, select default.
⑦ In CPU Architecture, use the default value and enable Cluster HA.
⑧ In Cluster Node, use the default values of instance specifications for Master and Core nodes. Use the default values for the instance count as well as data disk type and size. Do not add Task nodes.
⑨ In Login Mode, select Password. Enter the password of user root and confirm the password.
⑩ Click Next.
Configure Hardware - 01
Configure Hardware - 02
Step 4: Set Advanced Options
① Use the default settings for parameters on the Set Advanced Options page.
② Click Buy Now. The page is displayed showing that the task has been submitted.
③ Click Back to Cluster List. You can view the status of the cluster on the Active Clusters page. It takes some time to create a cluster. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.
Set Advanced Options
Step 5: Prepare the Hadoop Sample Program and Data Files
① Prepare the wordcount program.
Download the Hadoop sample program (including wordcount).
For example, select hadoop-3.1.3.tar.gz, decompress it, and obtain the Hadoop sample program hadoop-mapreduce-examples-3.1.3.jar in the hadoop-3.1.3\share\hadoop\mapreduce directory.
② Prepare data files.
There is no format requirement for data files. Prepare two .txt files.
In this example, files wordcount1.txt and wordcount2.txt are used.
Step 6: Upload Data to OBS
① Log in to OBS Console, and click Create Bucket to create a bucket named mrs-word01.
② Click the bucket name mrs-word01 to go to the Bucket List page. In the left navigation pane, choose Objects. On the Objects tab page, click Create Folder to create the program and input folders.
③ Go to the program folder, and upload the Hadoop sample program downloaded in Step 5.
④ Go to the input folder, and upload the wordcount1.txt and wordcount2.txt data files prepared in Step 5.
⑤ To submit a job on the GUI, go to Step 7.
To submit a job through a node at the cluster background, go to Step 8.
Create an OBS Bucket and Folders
Step 7: Submit a Job on the GUI
① In the left navigation pane of the MRS management console, choose Clusters > Active Clusters. Click the mrs_demo cluster name.
② On the cluster details page, click the Jobs tab and then click Create. The Create Job page is displayed. To submit a job through a node at the cluster background, refer to Step 8.
③ In Type, select MapReduce.
④ In Name, enter wordcount.
⑤ In Program Path, click OBS and select the Hadoop sample program uploaded in Step 6.
⑥ In Parameters, enter wordcount obs://mrs-word01/input/ obs://mrs-word01/output/. output is an output path. Enter a directory that does not exist.
⑦ Leave Service Parameter blank.
⑧ Click OK to submit the job. If the job has been successfully submitted, its status is Running by default. You do not need to manually execute the job.
⑨ Go to the Jobs tab page, view the job status and logs, and go to Step 9 to view the job execution result.
Create a Job
Step 8: Submit a Job Through a Node at the Cluster Background
① Log in to the MRS management console, and click the name of the cluster created in Step 2. The basic cluster information page is displayed.
② On the Nodes tab page, click the name of a Master node to go to the ECS management console.
③ Click Remote Login in the upper right corner of the page.
④ Enter the username and password of the Master node as prompted. The username is root and the password is the one set during cluster creation.
⑤ Run the source /opt/client/bigdata_env command to configure environment variables.
⑥ For a security cluster, run the kinit MRS cluster username command, for example, kinit admin, to authenticate the current user of the cluster.
⑦ Run the following command to copy the sample program in the OBS bucket to the Master node in the cluster:
hadoop fs -Dfs.s3a.access.key=AK -Dfs.s3a.secret.key=SK -copyToLocal source_path.jar target_path.jar
Example: hadoop fs -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX -copyToLocal "s3a://mrs-word/program/hadoop-mapreduce-examples-XXX.jar" "/home/omm/hadoop-mapreduce-examples-XXX.jar"
To obtain the AK/SK pair for logging in to OBS Console, hover the cursor over the username in the upper right corner of the management console, and choose My Credentials > Access Keys.
⑧ Run the following command to submit a wordcount job. If data needs to be read from OBS or outputted to OBS, add the AK/SK parameters.
source /opt/client/bigdata_env;hadoop jar execute_jar wordcount input_path output_path
Example: source /opt/client/bigdata_env;hadoop jar /home/omm/hadoop-mapreduce-examples-XXX.jar wordcount -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX "s3a://mrs-word/input/*" "s3a://mrs-word/output/"
In the preceding command, input_path indicates a path for storing job input files on OBS. output_path indicates a path for storing job output files on OBS. Set it to a directory that does not exist.
Log In to the Master Node
Step 9: View the Job Execution Results
Log in to OBS Console. Go to the output path in the mrs-word01 bucket specified during job submission, and view the job output file. You need to download the file to the local host and open it in TXT format.
View the Job Execution Result