Updated on 2023-08-31 GMT+08:00

Instance

Scenario

The sample project illustrates how to compile MapReduce jobs to visit multiple service components in HDFS, HBase, and Hive, helping users to understand key actions such as certificating and configuration loading.

The logic of the sample project is as follows:

The input data is HDFS text file and the input file is log1.txt.

YuanJing,male,10
GuoYijun,male,5

Map:

  1. Obtain one row of the input data and extract the user name.
  2. Query one piece of data from HBase.
  3. Query one piece of data from Hive.
  4. Combine the data queried from HBase and that from Hive as the output of Map as the output of Map.

Reduce:

  1. Obtain the last piece of data from Map output.
  2. Import the data to HBase.
  3. Save the data to HDFS.