This is part 3 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel) 

  1. Introduction
  2. Cluster Setup
  3. Tuning
  4. Experiment
  5. Experiment (continued)
  6. Issues
  7. Conclusions

Stack Enhancement and Parameter Tuning

Stack Enhancement

To perform the study, we made a set of enhancements in the software stack:

  • HDFS:

    • Support a new storage RAMDISK

    • Add file level mover support, a user can move blocks per file without scanning all metadata in NameNode

  • HBase:

    • WAL, flushed HFiles, HFiles generated in compactions, and archived HFiles can be stored in different storage

    • When renaming HFiles across storage, the blocks of that file would be moved to the target storage asynchronously

HDFS/HBase Tuning

This step is to find the best configurations for HDFS and HBase.

Known Key Performance Factors in HBase

These are the key performance factors in HBase:

  1. WAL: write ahead log to guarantee the non-volatility and consistency of the data. Each record that is inserted to HBase must be written to WAL which can slow down user operations. It’s latency-sensitive.

  2. Memstore and Flush: The records inserted into HBase are cached in memstore, and when reaches a threshold the memstore is flushed to a store file. Slow flush can lead to high GC (Garbage Collection) pause, and make memory usage reach the thresholds in regions and region server, which can block the user operations.

  3. Compaction and Number of Store Files: HBase compaction compacts small store files to a larger one which can reduce the number of store files and accelerate the reading, but it can generate heavy I/O and consume the disk bandwidth in runtime. Less compaction can accelerate the writing but generates too many store files, which slow down the reading. When there are too many store files, the memstore flush can be slowed down which can lead to a large memstore and further slow the user operations.

Based on this understanding, the following are the tuned parameters we finally used.

Property

Value

dfs.datanode.handler.count

64

dfs.namenode.handler.count

100

Table 8. HDFS configuration

Property

Value

hbase.regionserver.thread.compaction.small

3 for non-SSD test cases.

8 for all SSD related test cases.

hbase.hstore.flusher.count

5 for non-SSD test cases.

15 for all SSD related test cases.

hbase.wal.regiongrouping.numgroups

4

hbase.wal.provider

multiwal

hbase.hstore.blockingStoreFiles

15

hbase.regionserver.handler.count

200

hbase.hregion.memstore.chunkpool.maxsize

1

hbase.hregion.memstore.chunkpool.initialsize

0.5

Table 9. HBase configuration

Go to part 4, Experiment