Help Center>MapReduce Service>Getting Started>Getting Started with Hadoop

Getting Started with Hadoop

MapReduce Service (MRS) provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm.

This document describes how to use Hadoop to submit a wordcount job in normal and security clusters from scratch. A wordcount job is the most classic Hadoop job that counts words in massive amounts of text.

This section contains the following parts.

  1. Step 1: Buy a Cluster
  2. Step 2: Prepare the Hadoop Sample Program and Data Files
  3. Step 3: Upload Data to OBS
  4. Step 4: Submit a Job on the GUI
  5. Step 5: Submit a Job Through a Node at the Cluster Background
  6. Step 6: View the Job Execution Results

Step 1: Buy a Cluster

  1. Log in to the HUAWEI CLOUD management console.
  2. Choose EI Enterprise Intelligence > MapReduce Service. The MRS management console is displayed.

    Figure 1 Choose MapReduce Service

  3. Click Buy Cluster. The Buy Cluster page is displayed.

    Figure 2 Buy Cluster

  4. Click the Custom Config tab on the cluster purchase page.

    Figure 3 Custom Config

  5. In Region, select a desired region.
  6. In Cluster Name, enter mrs_demo or specify a name according to naming rules.
  7. In Cluster Version, select MRS 2.1.0.

    Figure 4 Configure Software-01

  8. In Cluster Type, select Analysis cluster.
  9. Select all components of an analysis cluster. Use the default values for other parameters.

    Figure 5 Configure Software-02

  10. Disable Kerberos Authentication.
  11. In Username, use the default value admin.
  12. In Password, enter the password of the MRS Manager administrator.

    Figure 6 Configure Software-03

  13. Click Next.
  14. In Billing Mode, select Pay-per-use.
  15. In AZ, select AZ2.
  16. Use the default values for VPC and Subnet, or click View VPC to create a VPC.
  17. In Security Group, use the default value Auto create.
  18. In EIP, use the default value Bind later.
  19. In Enterprise Project, select default.

    Figure 7 Configure Hardware-01

  20. In CPU Architecture, use the default value.
  21. Enable Cluster HA.
  22. In Cluster Node, use the default values of instance specifications for Master and Core nodes. Use the default values for the instance count as well as data disk type and size. Do not add Task nodes.
  23. In Login Mode, select Password. Enter the password of user root and confirm the password.

    Figure 8 Configure Hardware-02

  24. Click Next.
  25. Use the default settings for parameters on the Set Advanced Options page.

    Figure 9 Set Advanced Options

  26. Click Buy Now. The page is displayed showing that the task has been submitted.
  27. Click Back to Cluster List. You can view the status of the cluster on the Active Clusters page.

    It takes some time to create a cluster. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.

Step 2: Prepare the Hadoop Sample Program and Data Files

  1. Prepare the wordcount program.

    Download the Hadoop sample program (including wordcount).

    For example, select hadoop-3.1.3.tar.gz, decompress it, and obtain the Hadoop sample program hadoop-mapreduce-examples-3.1.3.jar in the hadoop-3.1.3\share\hadoop\mapreduce directory.

    Figure 10 Sample Program

  2. Prepare data files.

    There is no format requirement for data files. Prepare two .txt files.In this example, files wordcount1.txt and wordcount2.txt are used.

    Figure 11 Sample Files

Step 3: Upload Data to OBS

  1. Log in to OBS Console, and click Create Bucket to create a bucket named mrs-word01.
  2. Click the bucket name mrs-word01 to go to the Bucket List page. In the left navigation pane, choose Objects. On the Objects tab page, click Create Folder to create the program and input folders.
  3. Go to the program folder, and upload the Hadoop sample program downloaded in 1.
  4. Go to the input folder, and upload the wordcount1.txt and wordcount2.txt data files prepared in 2.
  5. To submit a job on the GUI, go to Step 4: Submit a Job on the GUI.

Step 4: Submit a Job on the GUI

  1. In the left navigation pane of the MRS management console, choose Clusters > Active Clusters. Click the mrs_demo cluster name.
  2. On the cluster details page, click the Jobs tab and then click Create. The Create Job page is displayed. To submit a job through a node at the cluster background, refer to Step 5: Submit a Job Through a Node at the Cluster Background.
  3. In Type, select MapReduce.
  4. In Name, enter wordcount.
  5. In Program Path, click OBS and select the Hadoop sample program uploaded in Step 3: Upload Data to OBS.
  6. In Parameters, enter wordcount obs://mrs-word01/input/ obs://mrs-word01/output/. output is an output path. Enter a directory that does not exist.
  7. Leave Service Parameter blank.
  8. Click OK to submit the job.

    If the job has been successfully submitted, its status is Accepted by default. You do not need to manually execute the job.

    Figure 12 Create a Job

  9. Go to the Jobs tab page, view the job status and logs, and go to Step 6: View the Job Execution Results to view the job execution result.

Step 5: Submit a Job Through a Node at the Cluster Background

  1. Log in to the MRS management console, and click the name of the cluster created in Step 1: Buy a Cluster. The basic cluster information page is displayed.
  2. On the Nodes tab page, click the name of a Master node to go to the ECS management console.
  3. Click Remote Login in the upper right corner of the page.

    Figure 13 Log In to the Master Node

  4. Enter the username and password of the Master node as prompted. The username is root and the password is the one set during cluster creation.
  5. Run the source /opt/client/bigdata_env command to configure environment variables.
  6. For a security cluster, run the kinit MRS cluster username command, for example, kinit admin, to authenticate the current user of the cluster.
  7. Run the following command to copy the sample program in the OBS bucket to the Master node in the cluster:

    hadoop fs -Dfs.s3a.access.key=AK -Dfs.s3a.secret.key=SK -copyToLocal source_path.jar target_path.jar

    Example: hadoop fs -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX -copyToLocal "s3a://mrs-word/program/hadoop-mapreduce-examples-XXX.jar" "/home/omm/hadoop-mapreduce-examples-XXX.jar"

    To obtain the AK/SK pair for logging in to OBS Console, hover the cursor over the username in the upper right corner of the management console, and choose My Credentials > Access Keys.

  8. Run the following command to submit a wordcount job. If data needs to be read from OBS or outputted to OBS, add the AK/SK parameters.

    source /opt/client/bigdata_env;hadoop jar execute_jar wordcount input_path output_path

    Example: source /opt/client/bigdata_env;hadoop jar /home/omm/hadoop-mapreduce-examples-XXX.jar wordcount -Dfs.s3a.access.key=XXXX -Dfs.s3a.secret.key=XXXX "s3a://mrs-word/input/*" "s3a://mrs-word/output/"

    In the preceding command, input_path indicates a path for storing job input files on OBS. output_path indicates a path for storing job output files on OBS. Set it to a directory that does not exist.

Step 6: View the Job Execution Results

Log in to OBS Console. Go to the output path in the mrs-word01 bucket specified during job submission, and view the job output file. You need to download the file to the local host and open it in TXT format.

Figure 14 View the Job Execution Result