Implementing Bidirectional Data Exchange with HBase (Java)
Scenarios
Assume table1 of HBase stores the user consumption amount of the current day and table2 stores the history consumption data.
In table1, key=1 and cf:cid=100 indicate that the consumption amount of user 1 in the current day is 100 CNY.
In table2, key=1 and cf:cid=1000 indicate that the history consumption amount of user 1 is 1000 CNY.
The Spark application shall achieve the following function:
Add the current consumption amount (100) to the history consumption amount (1000).
The running result is that the total consumption amount of user 1 (key=1) in table2 is 1100 CNY (cf:cid=1100).
Data Preparation
Use the Spark-Beeline tool to create table1 and table2 (Spark table and HBase table, respectively), and insert data by HBase.
- Ensure that JDBCServer is started. On the Spark2x client, perform the following operations using the Spark-Beeline command tool:
- Use the Spark-Beeline tool to create Spark table1:
create table table1
(
key string,
cid string
)
using org.apache.spark.sql.hbase.HBaseSource
options(
hbaseTableName "table1",
keyCols "key",
colsMapping "cid=cf.cid");
- Run the following command on HBase to insert data to table1:
put 'table1', '1', 'cf:cid', '100'
- Use the Spark-Beeline tool to create Spark table2:
create table table2
(
key string,
cid string
)
using org.apache.spark.sql.hbase.HBaseSource
options(
hbaseTableName "table2",
keyCols "key",
colsMapping "cid=cf.cid");
- Run the following command on HBase to insert data to table2 :
put 'table2', '1', 'cf:cid', '1000'
Development Idea
- Query the data in table1.
- Query the data in table2 using the key value of table1.
- Add up the queried data.
- Write the results of the preceding step to table2.
Packaging the Project
- Use the Maven tool provided by IDEA to pack the project and generate a JAR file (The class name and file name must be the same as those in the actual code. The following is only an example). For details, see Writing and Running the Spark Program in the Linux Environment.
- Upload the JAR file to any directory (for example, /opt/female/) on the server where the Spark client is located.
Before running the sample project, set the spark.yarn.security.credentials.hbase.enabled configuration item to true in the spark-defaults.conf configuration file of Spark client. (The default value is false. Changing the value to true does not affect existing services. If you want to uninstall the HBase service, change the value back to false first.)
Running Tasks
Go to the Spark client directory and run the following commands to invoke the bin/spark-submit script to run the code:
- Run Java or Scala sample code.
bin/spark-submit --conf spark.yarn.user.classpath.first=true --class com.huawei.bigdata.spark.examples.SparkHbasetoHbase --master yarn --deploy-mode client /opt/female/SparkHbasetoHbase-1.0.jar
- Run the Python sample project.
- PySpark does not provide HBase-related APIs. Therefore, Python is used to invoke Java code in this sample. Use Maven to pack the provided Java code into a JAR package and place it in the same driver class directory. When running the Python program, configure --jars to load the JAR package to the directory where the Python file resides.
bin/spark-submit --master yarn --deploy-mode client --conf spark.yarn.user.classpath.first=true --jars /opt/female/SparkHbasetoHbasePythonExample/SparkHbasetoHbase-1.0.jar,/opt/female/protobuf-java-2.5.0.jar /opt/female/SparkHbasetoHbasePythonExample/SparkHbasetoHbasePythonExample.py
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot