CarbonData Segment APIs

This topic describes the Segment APIs and their usage. All methods are in the org.apache.spark.util.CarbonSegmentUtil class.

The following method has been discarded:

/** 
* Returns the valid segments for the query based on the filter condition 
* present in carbonScanRdd. 
* 
* @param carbonScanRdd 
* @return Array of valid segments 
*/ 
@deprecated def getFilteredSegments(carbonScanRdd: CarbonScanRDD[InternalRow]): Array[String];

How to Use

Use the following method to obtain CarbonScanRDD from the query statement:

val df=carbon.sql("select * from table where age='12'") 
val myscan=df.queryExecution.sparkPlan.collect { 
case scan: CarbonDataSourceScan if scan.rdd.isInstanceOf[CarbonScanRDD[InternalRow]] => scan.rdd 
case scan: RowDataSourceScanExec if scan.rdd.isInstanceOf[CarbonScanRDD[InternalRow]] => scan.rdd 
}.head 
val carbonrdd=myscan.asInstanceOf[CarbonScanRDD[InternalRow]]

The following is an example:

CarbonSegmentUtil.getFilteredSegments(carbonrdd)

You can pass the following SQL statement to obtain the filtered segment:

/** 
* Returns an array of valid segment numbers based on the filter condition provided in the sql 
* NOTE: This API is supported only for SELECT Sql (insert into,ctas,.., is not supported) 
* 
* @param sql 
* @param sparkSession 
* @return Array of valid segments 
* @throws UnsupportedOperationException because Get Filter Segments API supports if and only 
* if only one carbon main table is present in query. 
*/ 
def getFilteredSegments(sql: String, sparkSession: SparkSession): Array[String];

The following is an example:

CarbonSegmentUtil.getFilteredSegments("select * from table where age='12'", sparkSession)

Pass the database name and table name to obtain the list of segments to be merged. The obtained segment list can be used as the parameter of the getMergedLoadName function.

/** 
* Identifies all segments which can be merged with MAJOR compaction type. 
* NOTE: This result can be passed to getMergedLoadName API to get the merged load name. 
* 
* @param sparkSession
* @param tableName 
* @param dbName 
* @return list of LoadMetadataDetails  
*/ 
def identifySegmentsToBeMerged(sparkSession: SparkSession, 
tableName: String, 
dbName: String) : util.List[LoadMetadataDetails];

The following is an example:

CarbonSegmentUtil.identifySegmentsToBeMerged(sparkSession, "table_test","default")

Pass the database name, table name, and custom segment list to obtain the list of segments that will be merged on demand. The obtained segment list can be used as the parameter of the getMergedLoadName function.

/** 
* Identifies all segments which can be merged with CUSTOM compaction type. 
*  NOTE: This result can be passed to getMergedLoadName API to get the merged load name. 
* 
* @param sparkSession 
* @param tableName 
* @param dbName 
* @param customSegments 
* @return list of LoadMetadataDetails 
* @throws UnsupportedOperationException if customSegments is null or empty. 
* @throws MalformedCarbonCommandException if segment does not exist or is not valid 
*/ 
def identifySegmentsToBeMergedCustom(sparkSession: SparkSession, 
tableName: String, 
dbName: String, 
customSegments: util.List[String]): util.List[LoadMetadataDetails];

The following is an example:

val customSegments = new util.ArrayList[String]()
customSegments.add("1") 
customSegments.add("2")  
CarbonSegmentUtil.identifySegmentsToBeMergedCustom(sparkSession, "table_test","default", customSegments)

Pass a specified segment list, and return the new Merged Load Name.

/** 
* Returns the Merged Load Name for given list of segments 
* 
* @param list of segments 
* @return Merged Load Name 
* @throws UnsupportedOperationException if list of segments is less than 1 
*/ 
def getMergedLoadName(list: util.List[LoadMetadataDetails]): String;

The following is an example:

val carbonTable = CarbonEnv.getCarbonTable(Option(databaseName), tableName)(sparkSession) 
val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)  CarbonSegmentUtil.getMergedLoadName(loadMetadataDetails.toList.asJava)

Parent topic: CarbonData Syntax Reference

Previous topic: Concurrent CarbonData Table Operations

Next topic: CarbonData Tablespace Index