HDFS HSM and HBase: Tuning (Part 3 of 7)
This is part 3 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel)
Stack Enhancement and Parameter Tuning
Stack Enhancement
To perform the study, we made a set of enhancements in the software stack:
-
HDFS:
-
Support a new storage RAMDISK
-
Add file level mover support, a user can move blocks per file without scanning all metadata in NameNode
-
HBase:
-
WAL, flushed HFiles, HFiles generated in compactions, and archived HFiles can be stored in different storage
-
When renaming HFiles across storage, the blocks of that file would be moved to the target storage asynchronously
HDFS/HBase Tuning
This step is to find the best configurations for HDFS and HBase.
Known Key Performance Factors in HBase
These are the key performance factors in HBase:
-
WAL: write ahead log to guarantee the non-volatility and consistency of the data. Each record that is inserted to HBase must be written to WAL which can slow down user operations. It’s latency-sensitive.
-
Memstore and Flush: The records inserted into HBase are cached in memstore, and when reaches a threshold the memstore is flushed to a store file. Slow flush can lead to high GC (Garbage Collection) pause, and make memory usage reach the thresholds in regions and region server, which can block the user operations.
-
Compaction and Number of Store Files: HBase compaction compacts small store files to a larger one which can reduce the number of store files and accelerate the reading, but it can generate heavy I/O and consume the disk bandwidth in runtime. Less compaction can accelerate the writing but generates too many store files, which slow down the reading. When there are too many store files, the memstore flush can be slowed down which can lead to a large memstore and further slow the user operations.
Based on this understanding, the following are the tuned parameters we finally used.
Property |
Value |
dfs.datanode.handler.count |
64 |
dfs.namenode.handler.count |
100 |
Table 8. HDFS configuration
Property |
Value |
hbase.regionserver.thread.compaction.small |
3 for non-SSD test cases. 8 for all SSD related test cases. |
hbase.hstore.flusher.count |
5 for non-SSD test cases. 15 for all SSD related test cases. |
hbase.wal.regiongrouping.numgroups |
4 |
hbase.wal.provider |
multiwal |
hbase.hstore.blockingStoreFiles |
15 |
hbase.regionserver.handler.count |
200 |
hbase.hregion.memstore.chunkpool.maxsize |
1 |
hbase.hregion.memstore.chunkpool.initialsize |
0.5 |
Table 9. HBase configuration
Go to part 4, Experiment