Instance
Instance Description
Data is written to HBase in real time for point query services and is synchronized to CarbonData tables in batches at a specified interval for analytical query services.
Data Preparation
- Create an HBase table and construct data with key, modify_time, and valid columns. key of each data record is unique in the table. modify_time indicates the modification time, and valid indicates whether the data is valid. In this example, 1 indicates that the data is valid, and 0 indicates that the data is invalid.
- Run the following commands to create a Hive foreign table for HBase in SparkSQL:
create table external_hbase_table(key string ,modify_time STRING, valid STRING)
using org.apache.spark.sql.hbase.HBaseSource
options(hbaseTableName "hbase_table", keyCols "key", colsMapping "modify_time=info.modify_time,valid=info.valid")
- Run the following command to create a CarbonData table in SparkSQL:
create table carbon01(key string,modify_time STRING, valid STRING) stored as carbondata;
- Initialize and load all data in the current HBase table to the CarbonData table.
insert into table carbon01 select * from external_hbase_table where valid='1';
- Run the following spark-submit command:
spark-submit --master yarn --deploy-mode client --class com.huawei.spark.examples.HBaseExternalHivetoCarbon /opt/example/HBaseExternalHivetoCarbon-1.0.jar
Last Article: Synchronizing HBase Data from Spark to CarbonData
Next Article: Java Example Code
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.