Preparing Initial Data
Scenario
Before commissioning the program, you need to prepare the data to be processed.
- Run the MapReduce statistics sample program. For details, see Planning MapReduce Statistics Sample Program Data.
- Run the MapReduce multi-component access sample program. For details, see Planning MapReduce Accessing Multi-Component Sample Program Data.
Planning MapReduce Statistics Sample Program Data
Store the log files to be processed in the HDFS system.
- Create a text file in the Linux system and copy the data to be processed to the file. For example, copy and save the content log1.txt in Typical Scenarios to input_data1.txt, and copy and save the content in log2.txt to input_data2.txt.
- Create the /tmp/input folder in the HDFS, and upload input_data1.txt and input_data2.txt to the folder.
- Run the following commands to go to the HDFS client directory and authenticate the user:
cd HDFS client installation directory
source bigdata_env
kinit Component service user (This user must have the permission to operate HDFS. Change the password upon the first authentication.)
- Run the following command to create the /tmp/input directory:
- Run the following command to upload the prepared file to the /tmp/input directory on the HDFS client:
hdfs dfs -put local_filepath/input_data1.txt /tmp/input
hdfs dfs -put local_filepath/input_data2.txt /tmp/input
- Run the following commands to go to the HDFS client directory and authenticate the user:
Planning MapReduce Accessing Multi-Component Sample Program Data
- Create an HDFS data file.
- Create a text file in the Linux system and copy the data to be processed to the file. For example, copy the content in log1.txt in Instance to data.txt.
- Run the following commands to go to the HDFS client directory and authenticate the user:
cd HDFS client installation directory
source bigdata_env
kinit Component service user (This user must have the permission to operate HDFS. Change the password upon the first authentication.)
- Create the /tmp/examples/multi-components/mapreduce/input/ folder in the HDFS, and upload the data.txt file to the directory. The operations are as follows:
- Create an HBase table and insert data.
- Run the following command to log in to the HBase client:
cd HBase client installation directory
source bigdata_env
kinit Component service user
hbase shell
- Run the following command to create a data table table1 in the HBase shell interaction window. The table has a column family cf.
- Run the following command to insert a data record whose rowkey is 1, column name is cid, and data value is 123:
- Run the following command to exit the HBase client:
- Run the following command to log in to the HBase client:
- Create a Hive table and insert data.
- Run the following command to log in to the Hive client:
cd Hive client installation directory
source bigdata_env
kinit Component service user
beeline
- Run the following command to create the person data table in the Hive beeline interaction window. The table contains three fields: name, render, and stayTime.
CREATE TABLE person(name STRING, gender STRING, stayTime INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
- Run the following command to load the data file in the Hive beeline interaction window:
LOAD DATA INPATH '/tmp/examples/multi-components/mapreduce/input/' OVERWRITE INTO TABLE person;
- Run the !q command to exit.
- Run the following command to log in to the Hive client:
- Loading data to Hive clears the HDFS data directory. Therefore, you need to perform 1 again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.