Preparing Initial Data
Scenario
Before commissioning the program, you need to prepare the data to be processed.
- Run the MapReduce statistics sample program. For details, see Planning MapReduce Statistics Sample Program Data.
- Run the MapReduce multi-component access sample program. For details, see Planning MapReduce Accessing Multi-Component Sample Program Data.
Planning MapReduce Statistics Sample Program Data
Store the log files to be processed in the HDFS system.
- Create a text file in the Linux system and copy the data to be processed to the file. For example, copy and save the content log1.txt in Typical Scenarios to input_data1.txt, and copy and save the content in log2.txt to input_data2.txt.
- Create the /tmp/input folder in the HDFS, and upload input_data1.txt and input_data2.txt to the folder.
- Run the following commands to go to the HDFS client directory and authenticate the user:
cd HDFS client installation directory
source bigdata_env
kinit Component service user (This user must have the permission to operate HDFS. Change the password upon the first authentication.)
- Run the following command to create the /tmp/input directory:
- Run the following command to upload the prepared file to the /tmp/input directory on the HDFS client:
hdfs dfs -put local_filepath/input_data1.txt /tmp/input
hdfs dfs -put local_filepath/input_data2.txt /tmp/input
- Run the following commands to go to the HDFS client directory and authenticate the user:
Planning MapReduce Accessing Multi-Component Sample Program Data
- Create an HDFS data file.
- Create a text file in the Linux system and copy the data to be processed to the file. For example, copy the content in log1.txt in Instance to data.txt.
- Run the following commands to go to the HDFS client directory and authenticate the user:
cd HDFS client installation directory
source bigdata_env
kinit Component service user (This user must have the permission to operate HDFS. Change the password upon the first authentication.)
- Create the /tmp/examples/multi-components/mapreduce/input/ folder in the HDFS, and upload the data.txt file to the directory. The operations are as follows:
- Create an HBase table and insert data.
- Run the following command to log in to the HBase client:
cd HBase client installation directory
source bigdata_env
kinit Component service user
hbase shell
- Run the following command to create a data table table1 in the HBase shell interaction window. The table has a column family cf.
- Run the following command to insert a data record whose rowkey is 1, column name is cid, and data value is 123:
- Run the following command to exit the HBase client:
- Run the following command to log in to the HBase client:
- Create a Hive table and insert data.
- Run the following command to log in to the Hive client:
cd Hive client installation directory
source bigdata_env
kinit Component service user
beeline
- Run the following command to create the person data table in the Hive beeline interaction window. The table contains three fields: name, render, and stayTime.
CREATE TABLE person(name STRING, gender STRING, stayTime INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
- Run the following command to load the data file in the Hive beeline interaction window:
LOAD DATA INPATH '/tmp/examples/multi-components/mapreduce/input/' OVERWRITE INTO TABLE person;
- Run the !q command to exit.
- Run the following command to log in to the Hive client:
- Loading data to Hive clears the HDFS data directory. Therefore, you need to perform 1 again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot