CarbonData Segment APIs
This topic describes the Segment APIs and their usage. All methods are in the org.apache.spark.util.CarbonSegmentUtil class.
The following method has been discarded:
/** * Returns the valid segments for the query based on the filter condition * present in carbonScanRdd. * * @param carbonScanRdd * @return Array of valid segments */ @deprecated def getFilteredSegments(carbonScanRdd: CarbonScanRDD[InternalRow]): Array[String];
How to Use
val df=carbon.sql("select * from table where age='12'") val myscan=df.queryExecution.sparkPlan.collect { case scan: CarbonDataSourceScan if scan.rdd.isInstanceOf[CarbonScanRDD[InternalRow]] => scan.rdd case scan: RowDataSourceScanExec if scan.rdd.isInstanceOf[CarbonScanRDD[InternalRow]] => scan.rdd }.head val carbonrdd=myscan.asInstanceOf[CarbonScanRDD[InternalRow]]
The following is an example:
CarbonSegmentUtil.getFilteredSegments(carbonrdd)
/** * Returns an array of valid segment numbers based on the filter condition provided in the sql * NOTE: This API is supported only for SELECT Sql (insert into,ctas,.., is not supported) * * @param sql * @param sparkSession * @return Array of valid segments * @throws UnsupportedOperationException because Get Filter Segments API supports if and only * if only one carbon main table is present in query. */ def getFilteredSegments(sql: String, sparkSession: SparkSession): Array[String];
The following is an example:
CarbonSegmentUtil.getFilteredSegments("select * from table where age='12'", sparkSession)
Pass the database name and table name to obtain the list of segments to be merged. The obtained segment list can be used as the parameter of the getMergedLoadName function.
/** * Identifies all segments which can be merged with MAJOR compaction type. * NOTE: This result can be passed to getMergedLoadName API to get the merged load name. * * @param sparkSession * @param tableName * @param dbName * @return list of LoadMetadataDetails */ def identifySegmentsToBeMerged(sparkSession: SparkSession, tableName: String, dbName: String) : util.List[LoadMetadataDetails];
The following is an example:
CarbonSegmentUtil.identifySegmentsToBeMerged(sparkSession, "table_test","default")
Pass the database name, table name, and custom segment list to obtain the list of segments that will be merged on demand. The obtained segment list can be used as the parameter of the getMergedLoadName function.
/** * Identifies all segments which can be merged with CUSTOM compaction type. * NOTE: This result can be passed to getMergedLoadName API to get the merged load name. * * @param sparkSession * @param tableName * @param dbName * @param customSegments * @return list of LoadMetadataDetails * @throws UnsupportedOperationException if customSegments is null or empty. * @throws MalformedCarbonCommandException if segment does not exist or is not valid */ def identifySegmentsToBeMergedCustom(sparkSession: SparkSession, tableName: String, dbName: String, customSegments: util.List[String]): util.List[LoadMetadataDetails];
The following is an example:
val customSegments = new util.ArrayList[String]() customSegments.add("1") customSegments.add("2") CarbonSegmentUtil.identifySegmentsToBeMergedCustom(sparkSession, "table_test","default", customSegments)
Pass a specified segment list, and return the new Merged Load Name.
/** * Returns the Merged Load Name for given list of segments * * @param list of segments * @return Merged Load Name * @throws UnsupportedOperationException if list of segments is less than 1 */ def getMergedLoadName(list: util.List[LoadMetadataDetails]): String;
The following is an example:
val carbonTable = CarbonEnv.getCarbonTable(Option(databaseName), tableName)(sparkSession) val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) CarbonSegmentUtil.getMergedLoadName(loadMetadataDetails.toList.asJava)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.