Scenario Description
Scenario Description
Assume that person table of Hive stores a user's consumption amount on the current day and HBase table2 stores the user's history consumption amount data.
In the person table, the name=1,account=100 record indicates that user1's consumption amount on the current day is 100 CNY.
In table2, the key=1,cf:cid=1000 record indicates that user1's history consumption amount is 1000 CNY.
Based on some service requirements, a Spark application must be developed to implement the following functions:
Calculate a user's history consumption amount based on the user name, that is, the user's total consumption amount =100 (consumption amount of the current day) + 1000 (history consumption amount).
In the preceding example, the application run result is that in table2, the total consumption amount of user1 (key=1) is cf:cid=1100 CNY.
Data Planning
Before developing the application, create a Hive table named person and insert data to the table. At the same time, create HBase table2 so that you can write the data analysis result to it.
- Save original log files to HDFS.
- Create a blank log1.txt file on the local PC and write the following content to the file.
1,100
- Create the /tmp/input directory in HDFS and upload the log1.txt file to the directory.
- On the HDFS client, run the following commands for authentication:
kinit -kt '/opt/client/Spark/spark/conf/user.keytab' <Service user for authentication>
Specify the path of the user.keytab file based on the site requirements.
- On the HDFS client running the Linux OS, run the hadoop fs -mkdir /tmp/input command (or the hdfs dfs command) to create a directory.
- On the HDFS client running the Linux OS, run the hadoop fs -put log1.txt /tmp/input command to upload the data file.
- On the HDFS client, run the following commands for authentication:
- Create a blank log1.txt file on the local PC and write the following content to the file.
- Store the imported data to the Hive table.
Ensure that the ThriftServer is started. Use the Beeline tool to create a Hive table and insert data to the table.
- Run the following command to create a Hive table named person:
create table person
(
name STRING,
account INT
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' STORED AS TEXTFILE;
- Run the following command to insert data to the person table:
load data inpath '/tmp/input/log1.txt' into table person;
- Run the following command to create a Hive table named person:
- Create an HBase table.
- Run the following command to create a table named table2 through HBase:
create 'table2', 'cf'
- Run the following command on HBase to insert data to HBase table2:
If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-default.conf and on the sparkJDBC server to true.
- Run the following command to create a table named table2 through HBase:
Development Guidelines
- Query data in the person Hive table.
- Query data in table2 based on the key value in the person table.
- Sum the data records obtained in the previous two steps.
- Write the result of the previous step to table2.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.