HBase

How HBase Minor Compaction Works

Understanding HBase minor compaction -- how files are selected for compaction using the ratio algorithm, with a worked example showing the selection logic.

2 January 2013 · 2 min read

Compaction is the process in which HBase combines small files (HStoreFiles) into bigger ones.

It’s of two types:

  • Minor: Takes a few files which are placed together and merges them into one.
  • Major: Takes all the files in a region and merges them into one.

This post covers minor compaction. If you want to read about major compaction, please read the other post: How HBase Major Compaction Works. I suggest reading minor compaction first.

Let’s see what decides the term “few” in minor compaction.

Configuration Properties

The following properties affect minor compaction:

# Minimum number of StoreFiles per Store to be selected for a compaction to occur.
# Default: 2
hbase.hstore.compaction.min=2

# Maximum number of StoreFiles to compact per minor compaction.
# Default: 10
hbase.hstore.compaction.max=10

# Any StoreFile smaller than this setting will automatically be a candidate for compaction.
hbase.hstore.compaction.min.size

# Any StoreFile larger than this setting will automatically be excluded from compaction.
hbase.hstore.compaction.max.size

# Ratio used in compaction file selection algorithm.
hbase.store.compaction.ratio

File Selection Algorithm

The file which would be used for minor compaction is decided based on the following logic:

A file is selected for compaction when file_size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio

Worked Example

Consider the following configuration:

hbase.store.compaction.ratio=1.0
hbase.hstore.compaction.min=3
hbase.hstore.compaction.max=5
hbase.hstore.compaction.min.size=10
hbase.hstore.compaction.max.size=1000

The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes (oldest to newest).

With the above parameters, the files selected for minor compaction are 23, 12, and 12.

Why? Remember the logic — a file is selected when file_size <= sum(smaller_files_size) * ratio:

  • 100 — No, because sum(50, 23, 12, 12) * 1.0 = 97
  • 50 — No, because sum(23, 12, 12) * 1.0 = 47
  • 23 — Yes, because sum(12, 12) * 1.0 = 24
  • 12 — Yes, because the previous file has been included, and this does not exceed the max-file limit of 5
  • 12 — Yes, because the previous file had been included, and this does not exceed the max-file limit of 5

Hope this helps in understanding HBase minor compaction.

Happy Hadooping :)