Updated on 2024-08-10 GMT+08:00

Development Plan

Instance Description

Data is written to HBase in real time for point query services and is synchronized to CarbonData tables in batches at a specified interval for analytical query services.

Data Preparation

  1. Create an HBase table and construct data with key, modify_time, and valid columns. key of each data record is unique in the table. modify_time indicates the modification time, and valid indicates whether the data is valid. In this example, 1 indicates that the data is valid, and 0 indicates that the data is invalid.

    For example, go to HBase Shell and run the following commands:

    create 'hbase_table','key','info'

    put 'hbase_table','1','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','1','info:valid','1'

    put 'hbase_table','2','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','2','info:valid','1'

    put 'hbase_table','3','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','3','info:valid','0'

    put 'hbase_table','4','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','4','info:valid','1'

    The values of modify_time in the preceding information can be set to the time earlier than the current time.

    put 'hbase_table','5','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','5','info:valid','1'

    put 'hbase_table','6','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','6','info:valid','1'

    put 'hbase_table','7','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','7','info:valid','0'

    put 'hbase_table','8','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','8','info:valid','1'

    put 'hbase_table','4','info:valid','0'

    put 'hbase_table','4','info:modify_time','2021-03-03 15:20:39'

    The values of modify_time in the preceding information can be set to the time within 30 minutes after the sample program is started. (30 minutes is the default synchronization interval of the sample program and can be modified.)

    put 'hbase_table','9','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','9','info:valid','1'

    put 'hbase_table','10','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','10','info:valid','1'

    put 'hbase_table','11','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','11','info:valid','0'

    put 'hbase_table','12','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','12','info:valid','1'

    The values of modify_time in the preceding information can be set to the time from 30 minutes to 60 minutes after the sample program is started, that is, the second synchronization period.

  2. Run the following commands to create a Hive foreign table for HBase in SparkSQL:

    create table external_hbase_table(key string ,modify_time STRING, valid STRING)

    using org.apache.spark.sql.hbase.HBaseSource

    options(hbaseTableName "hbase_table", keyCols "key", colsMapping "modify_time=info.modify_time,valid=info.valid");

  3. Run the following command to create a CarbonData table in SparkSQL:

    create table carbon01(key string,modify_time STRING, valid STRING) stored as carbondata;

  4. Initialize and load all data in the current HBase table to the CarbonData table.

    insert into table carbon01 select * from external_hbase_table where valid='1';

  5. Run the following spark-submit command:

    spark-submit --master yarn --deploy-mode client --class com.huawei.bigdata.spark.examples.HBaseExternalHivetoCarbon /opt/example/HBaseExternalHivetoCarbon-1.0.jar