Help Center/ MapReduce Service/ Getting Started/ Using Hadoop from Scratch
Updated on 2022-09-14 GMT+08:00

Using Hadoop from Scratch

Procedure

  1. Purchase an MRS cluster.

    1. Log in to the Huawei Cloud console.
    2. Choose Service List > Analytics > MapReduce Service.
    3. On the Active Clusters page that is displayed, click Buy Cluster.
    4. Click the Custom Config tab.

  2. Configure software.

    1. Region: Select a region as required.
    2. Cluster Name: Enter mrs_demo or specify a name according to naming rules.
    3. Cluster Version: Select MRS 3.1.0.
    4. Cluster Type: Select Analysis Cluster.
    5. Select all analysis cluster components.
    6. Click Next.

  3. Configure hardware.

    1. Billing Mode: Select Pay-per-use.
    2. AZ: Select AZ2.
    3. VPC and Subnet: Retain their default values or click View VPC and View Subnet to create ones.
    4. Security Group: Use the default value Auto create.
    5. EIP: Bind later is selected by default.
    6. Enterprise Project: Select default.
    7. Cluster Node: Retain the default values. Do not add task nodes.
    8. Click Next.

  4. Set advanced options.

    1. Tag: Retain the default value.
    2. Agency, Alarm, Rule Name, and Topic Name: Retain the default values.
    3. Kerberos Authentication: Disabled
    4. Username: admin is used by default.
    5. Password and Confirm Password: Set them to the password of the FusionInsight Manager administrator.
    6. Login Mode: Select Password. Enter a password and confirm the password for user root.
    7. Secure Communications: Select Enable.
    8. Service Agreement: Select I have read and agree to the Huawei MRS Service Agreement.
    9. Click Buy Now. The page is displayed showing that the task has been submitted.
    10. Click Back to Cluster List. You can view the status of the cluster on the Active Clusters page. Wait for the cluster creation to complete. The initial status of the cluster is Starting. After the cluster has been created, the cluster status becomes Running.

  5. Prepare the Hadoop sample program and data files.

    1. Prepare the wordcount program.

      Download the Hadoop sample program (including wordcount). hadoop-3.1.4.tar.gz is used as an example. Use the actual program version provided in the link. For example, choose hadoop-3.1.4. On the page that is displayed, click hadoop-3.1.4.tar.gz to download it. Then, decompress it to obtain hadoop-3.1.4\share\hadoop\mapreduce (the Hadoop sample program) from hadoop-mapreduce-examples-3.1.4.jar.

    2. Prepare data files.

      There is no requirement on the format of data files. Prepare two .txt files. In this example, files wordcount1.txt and wordcount2.txt are used.

  6. Upload data to OBS.

    1. Log in to the OBS console and choose Parallel File Systems. On the Parallel File Systems page, click Create Parallel File System. On the Create Parallel File System page that is displayed, configure parameters to create a file system named mrs-word01.
    2. Click the name of the mrs-word01 file system. In the navigation pane on the left, choose Files. On the page that is displayed, click Create Folder to create the program and input folders.
    3. Go to the program folder and upload the Hadoop sample program downloaded in 5.
    4. Go to the input folder and upload the wordcount1.txt and wordcount2.txt data files prepared in 5.
    5. To submit a job on the GUI, go to 7.

      To submit a job through a cluster node, go to 8.

  7. Submit a job on the GUI.

    1. In the navigation pane of the MRS console, choose Clusters > Active Clusters. On the Active Clusters page, click the mrs_demo cluster.
    2. On the cluster information page, click the Jobs tab then Create to create a job. To submit a job through a cluster node, go to 8.
    3. Type: MapReduce
    4. Job Name: Enter wordcount.
    5. Program Path: Click OBS and select the Hadoop sample program uploaded in 6.
    6. Parameters: Enter wordcount obs://mrs-word01/input/ obs://mrs-word01/output/. output indicates the output path. Enter a directory that does not exist.
    7. Service Parameters: Leave it blank.
    8. Click OK to submit the job. After a job is submitted, it is in the Accepted state by default. You do not need to manually execute the job.
    9. Go to the Jobs tab page, view the job status and logs, and go to 9 to view the job execution result.

  8. Submit a job through a cluster node.

    1. Log in to the MRS console and click the cluster named mrs_demo to go to its details page.
    2. Click the Nodes tab. On this tab page, click the name of a master node to go to the ECS management console.
    3. Click Remote Login in the upper right corner of the page.
    4. Enter the username and password of the master node as prompted. The username is root and the password is the one configured during cluster creation.
    5. Run the source /opt/Bigdata/client/bigdata_env command to configure environment variables.
    6. If Kerberos authentication has been enabled, run the kinit MRS cluster user command, for example, kinit admin, to authenticate the current cluster user. Skip this step if Kerberos authentication is not enabled.
    7. Run the following command to copy the sample program in the OBS bucket to the master node in the cluster:

      hadoop fs -Dfs.obs.access.key=AK -Dfs.obs.secret.key=SK -copyToLocal source_path.jar target_path.jar Example: hadoop fs -Dfs.obs.access.key=XXXX -Dfs.obs.secret.key=XXXX -copyToLocal "obs://mrs-word01/program/hadoop-mapreduce-examples-XXX.jar" "/home/omm/hadoop-mapreduce-examples-XXX.jar" To obtain the AK/SK pair for logging in to the OBS console, hover your cursor over the username in the upper right corner of the management console, and choose My Credentials > Access Keys, or click Create Access Key to create one.

    8. Run the following command to submit a wordcount job. To read data from or write data to OBS, add AK/SK parameters. source /opt/Bigdata/client/bigdata_env;hadoop jar execute_jar wordcount input_path output_path Example: source /opt/Bigdata/client/bigdata_env;hadoop jar /home/omm/hadoop-mapreduce-examples-XXX.jar wordcount -Dfs.obs.access.key=XXXX -Dfs.obs.secret.key=XXXX "obs://mrs-word01/input/*" "obs://mrs-word01/output/" In this command, input_path indicates a path for storing job input files on OBS. output_path indicates a path for storing job output files on OBS and needs to be set to a directory that does not exist

  9. Query job execution results.

    1. Log in to the OBS console and click the name of the mrs-word01 bucket.
    2. On the page that is displayed, choose Objects in the navigation pane on the left. Go to the output path in the mrs-word01 bucket specified during job submission, and view the job output file. You need to download the file to the local host and open it in a .txt format.