更新时间:2024-11-28 GMT+08:00

创建数据表开启标签索引

功能介绍

建表功能同创建表,在此基础上,表属性配置标签索引schema。

样例代码

public void testCreateTable() {
  LOG.info("Entering testCreateTable.");
  HTableDescriptor tableDesc = new HTableDescriptor(tableName);
  HColumnDescriptor cdm = new HColumnDescriptor(FAM_M);
  cdm.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF);
  tableDesc.addFamily(cdm);
  HColumnDescriptor cdn = new HColumnDescriptor(FAM_N);
  cdn.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF);
  tableDesc.addFamily(cdn);

  // Add bitmap index definitions.
  List<BitmapIndexDescriptor> bitmaps = new ArrayList<>();//(1)
  bitmaps.add(BitmapIndexDescriptor.builder()
    // Describe which column should be indexed.
    .setColumnName(FamilyOnlyName.valueOf(FAM_M))//(2)
    // Describe how to extract term(s) from KeyValue
    .setTermExtractor(TermExtractor.NAME_VALUE_EXTRACTOR)//(3)
    .build());
  // It will help to add several properties into HTableDescriptor.
  // SHARD_NUM should be less than the region number
  IndexHelper.enableAutoIndex(tableDesc, SHARD_NUM, bitmaps);//(4)

  List<byte[]> splitList = Arrays.stream(SPLIT.split(LemonConstants.COMMA))
    .map(s -> org.lemon.common.Bytes.toBytes(s.trim()))
    .collect(Collectors.toList());
  byte[][] splitArray = splitList.toArray(new byte[splitList.size()][]);

  Admin admin = null;
  try {
    // Instantiate an Admin object.
    admin = conn.getAdmin();
    if (!admin.tableExists(tableName)) {
      LOG.info("Creating table...");
      admin.createTable(tableDesc, splitArray);
      LOG.info(admin.getClusterStatus());
      LOG.info(admin.listNamespaceDescriptors());
      LOG.info("Table created successfully.");
    } else {
      LOG.warn("table already exists");
    }
  } catch (IOException e) {
    LOG.error("Create table failed.", e);
  } finally {
    if (admin != null) {
      try {
        // Close the Admin object.
        admin.close();
      } catch (IOException e) {
        LOG.error("Failed to close admin ", e);
      }
    }
  }
  LOG.info("Exiting testCreateTable.");
}

注意事项

  • (1) BitmapIndexDescriptor描述哪些字段使用什么规则来抽取标签,数据表可以定义一个或多个BitmapIndexDescriptor。
  • (2) 定义哪些列需要抽取标签。取值范围:
    • ExplicitColumnName:指定列。
    • FamilyOnlyName:某一ColumnFamily下的所有列。
    • PrefixColumnName:拥有某一前缀的列。
  • (3) 定义列的抽取标签的规则,可选值如下:
    • QualifierExtractor:表示按照列名来抽取标签。

      例如,qualifier是Male,value是1,那么抽取的标签是Male。

    • QualifierValueExtractor:表示按照列名和value来抽取标签。

      例如,qualifier是education,value是master,那么抽取的标签是education:master。

    • QualifierArrayValueExtractor:可以抽取多个标签,value是json array格式。
      例如,qualifier是hobby,value 是 ["basketball","football","volleyball"],抽取的标签如下:
      hobby:basketball
      hobby:football
      hobby:volleyball
    • QualifierMapValueExtractor:可以抽取多个标签,value是json map格式。
      例如,qualifier是hobby,value是 {"basketball":"9","football":"8","volleyball":"7"},抽取的标签如下:
      hobby:basketball
      hobby:football
      hobby:volleyball
      hobby:basketball_9
      hobby:football_8
      hobby:volleyball_7
  • (4) 索引表的分区数量SHARD_NUM必须要小于或等于数据表。