Esta página aún no está disponible en su idioma local. Estamos trabajando arduamente para agregar más versiones de idiomas. Gracias por tu apoyo.

On this page


Updated on 2022-09-14 GMT+08:00


Data is written to HBase in real time for point query services and is synchronized to CarbonData tables in batches at a specified interval for analytical query services.

Data Preparation


Before running the sample program, set the configuration item to true in the spark-defaults.conf configuration file of Spark client. (The default value is false. Changing the value to true does not affect existing services.) If you want to uninstall the HBase service, change the value back to false first.

  1. Create an HBase table and construct data with key, modify_time, and valid columns. key of each data record is unique in the table. modify_time indicates the modification time, and valid indicates whether the data is valid. In this example, 1 indicates that the data is valid, and 0 indicates that the data is invalid.

    For example, go to HBase Shell and run the following commands:

    create 'hbase_table','key','info'

    put 'hbase_table','1','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','1','info:valid','1'

    put 'hbase_table','2','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','2','info:valid','1'

    put 'hbase_table','3','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','3','info:valid','0'

    put 'hbase_table','4','info:modify_time','2019-11-22 23:28:39'

    put 'hbase_table','4','info:valid','1'


    The values of modify_time in the preceding information can be set to the time earlier than the current time.

    put 'hbase_table','5','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','5','info:valid','1'

    put 'hbase_table','6','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','6','info:valid','1'

    put 'hbase_table','7','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','7','info:valid','0'

    put 'hbase_table','8','info:modify_time','2021-03-03 15:20:39'

    put 'hbase_table','8','info:valid','1'

    put 'hbase_table','4','info:valid','0'

    put 'hbase_table','4','info:modify_time','2021-03-03 15:20:39'


    The values of modify_time in the preceding information can be set to the time within 30 minutes after the sample program is started. (30 minutes is the default synchronization interval of the sample program and can be modified.)

    put 'hbase_table','9','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','9','info:valid','1'

    put 'hbase_table','10','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','10','info:valid','1'

    put 'hbase_table','11','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','11','info:valid','0'

    put 'hbase_table','12','info:modify_time','2021-03-03 15:32:39'

    put 'hbase_table','12','info:valid','1'


    The values of modify_time in the preceding information can be set to the time from 30 minutes to 60 minutes after the sample program is started, that is, the second synchronization period.

  2. Run the following commands to create a Hive foreign table for HBase in SparkSQL:

    create table external_hbase_table(key string ,modify_time STRING, valid STRING)

    using org.apache.spark.sql.hbase.HBaseSource

    options(hbaseTableName "hbase_table", keyCols "key", colsMapping "modify_time=info.modify_time,valid=info.valid");

  3. Run the following command to create a CarbonData table in SparkSQL:

    create table carbon01(key string,modify_time STRING, valid STRING) stored as carbondata;

  4. Initialize and load all data in the current HBase table to the CarbonData table.

    insert into table carbon01 select * from external_hbase_table where valid='1';

  5. Run the following spark-submit command:
    spark-submit --master yarn --deploy-mode client --class com.huawei.bigdata.spark.examples.HBaseExternalHivetoCarbon /opt/example/HBaseExternalHivetoCarbon-1.0.jar




Selected Content

Submit selected content with the feedback