Scenario Description
Scenario Description
Assume that person table of Hive stores a user's consumption amount on the current day and HBase table2 stores the user's history consumption amount data.
In the person table, the name=1,account=100 record indicates that user1's consumption amount on the current day is 100 CNY.
In table2, the key=1,cf:cid=1000 record indicates that user1's history consumption amount is 1000 CNY.
Based on some service requirements, a Spark application must be developed to implement the following functions:
Calculate a user's history consumption amount based on the user name, that is, the user's total consumption amount =100 (consumption amount of the current day) + 1000 (history consumption amount).
In the preceding example, the application run result is that in table2, the total consumption amount of user1 (key=1) is cf:cid=1100 CNY.
Data Planning
Before developing the application, create a Hive table named person and insert data to the table. At the same time, create HBase table2 so that you can write the data analysis result to it.
- Save original log files to HDFS.
- Create a blank log1.txt file on the local PC and write the following content to the file.
1,100
- Create the /tmp/input directory in HDFS and upload the log1.txt file to the directory.
- On the HDFS client, run the following commands for authentication:
kinit -kt '/opt/client/Spark/spark/conf/user.keytab' <Service user for authentication>
Specify the path of the user.keytab file based on the site requirements.
- On the HDFS client running the Linux OS, run the hadoop fs -mkdir /tmp/input command (or the hdfs dfs command) to create a directory.
- On the HDFS client running the Linux OS, run the hadoop fs -put log1.txt /tmp/input command to upload the data file.
- On the HDFS client, run the following commands for authentication:
- Create a blank log1.txt file on the local PC and write the following content to the file.
- Store the imported data to the Hive table.
Ensure that the ThriftServer is started. Use the Beeline tool to create a Hive table and insert data to the table.
- Run the following command to create a Hive table named person:
create table person
(
name STRING,
account INT
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' STORED AS TEXTFILE;
- Run the following command to insert data to the person table:
load data inpath '/tmp/input/log1.txt' into table person;
- Run the following command to create a Hive table named person:
- Create an HBase table.
- Run the following command to create a table named table2 through HBase:
create 'table2', 'cf'
- Run the following command on HBase to insert data to HBase table2:
If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-default.conf and on the sparkJDBC server to true.
- Run the following command to create a table named table2 through HBase:
Development Guidelines
- Query data in the person Hive table.
- Query data in table2 based on the key value in the person table.
- Sum the data records obtained in the previous two steps.
- Write the result of the previous step to table2.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot