HBase

How HBase Major Compaction Works

Understanding HBase major compaction -- how it differs from minor compaction, the configuration properties that control it, and the three methods that trigger it.

2 January 2013 · 2 min read

Compaction is the process in which HBase combines small files (HStoreFiles) into bigger ones.

It’s of two types:

  • Minor: Takes a few files which are placed together and merges them into one.
  • Major: Takes all the files in a region and merges them into one.

This post covers major compaction. If you want to read about minor compaction, please read the other post: How HBase Minor Compaction Works. I suggest reading that first.

Configuration Properties

The following properties affect major compaction:

# Time (in milliseconds) between major compactions of all HStoreFiles in a region.
# Set to 0 to disable automated major compactions.
# Default: 86400000 (1 day)
hbase.hregion.majorcompaction=86400000

# Multiplier that affects how often we check if compaction is necessary.
# The interval between checks is this value multiplied by hbase.server.thread.wakefrequency.
hbase.server.compactchecker.interval.multiplier

# Time to sleep between searches for work (in milliseconds).
# Used as sleep interval by service threads such as log roller.
# Default: 10000
hbase.server.thread.wakefrequency=10000

How Major Compaction Is Triggered

From this HBase mailing list discussion:

Major compactions are triggered by 3 methods: user issued, timed, and size-based.

Even if we disable time-based major compaction, we can hit size-based compactions. Minor compactions are issued on a size-based threshold.

The algorithm sees if sum(file[0:i] * ratio) > file[i+1] and includes file[0:i+1] if so.

This is a reverse iteration, so the highest i value is used. If all files match, then you can remove delete markers — which is the difference between a major and minor compaction. Major compactions aren’t a bad or time-intensive thing; it’s just delete marker removal.

Major vs Minor: Key Difference

Minor compactions will usually pick up a couple of the smaller adjacent StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells — only major compactions do this.

Now that you have read what major and minor compaction is, optimizing the above parameters based on your cluster profile is necessary, which we would cover in another post.

Happy Hadooping :)