Instance

Instance Description

Data is written to HBase in real time for point query services and is synchronized to CarbonData tables in batches at a specified interval for analytical query services.

Data Preparation

  1. Create an HBase table and construct data with key, modify_time, and valid columns. key of each data record is unique in the table. modify_time indicates the modification time, and valid indicates whether the data is valid. In this example, 1 indicates that the data is valid, and 0 indicates that the data is invalid.
  2. Run the following commands to create a Hive foreign table for HBase in SparkSQL:

    create table external_hbase_table(key string ,modify_time STRING, valid STRING)

    using org.apache.spark.sql.hbase.HBaseSource

    options(hbaseTableName "hbase_table", keyCols "key", colsMapping "modify_time=info.modify_time,valid=info.valid")

  3. Run the following command to create a CarbonData table in SparkSQL:

    create table carbon01(key string,modify_time STRING, valid STRING) stored as carbondata;

  4. Initialize and load all data in the current HBase table to the CarbonData table.

    insert into table carbon01 select * from external_hbase_table where valid='1';

  5. Run the following spark-submit command:
    spark-submit --master yarn --deploy-mode client --class com.huawei.spark.examples.HBaseExternalHivetoCarbon /opt/example/HBaseExternalHivetoCarbon-1.0.jar