Updated on 2022-09-15 GMT+08:00

How to Avoid Minor Compaction for Historical Data?

Question

How to avoid minor compaction for historical data?

Answer

If you want to load historical data first and then the incremental data, perform following steps to avoid minor compaction of historical data:

  1. Load all historical data.
  2. Configure the major compaction size to a value smaller than the segment size of historical data.
  3. Run the major compaction once on historical data so that these segments will not be considered later for minor compaction.
  4. Load the incremental data.
  5. You can configure the minor compaction threshold as required.

For example:

  1. Assume that you have loaded all historical data to CarbonData and the size of each segment is 500 GB.
  2. Set the threshold of major compaction property to carbon.major.compaction.size = 491520 (480 GB x 1024).
  3. Run major compaction. All segments will be compacted because the size of each segment is more than configured size.
  4. Perform incremental loading.
  5. Configure the minor compaction threshold to carbon.compaction.level.threshold = 6,6.
  6. Run minor compaction. As a result, only incremental data is compacted.