This is part 7 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel)

Conclusions

There are many things to consider when choosing the hardware of a cluster. According to the test results, in the SSD-related cases the network utility between DataNodes is larger than 10Gbps. If you are using a 10Gbps switch, the network will be the bottleneck and impact the performance. We suggest either extending the network bandwidth by network bonding, or upgrading to a more powerful switch with a higher bandwidth. In cases 1T_HDD and 1T_RAM_HDD, the network utility is lower than 10 Gbps in most time, using a 10 Gbps switch to connect DataNodes is fine.

In all 1T dataset tests, 1T_RAM_SSD shows the best performance. Appropriate mix of different types of storage can improve the HBase write performance. First, write the latency-sensitive and blocking data to faster storage, and write the data that are rarely compacted and accessed to slower storage. Second, avoid mixing types of storage with a large performance gap, such as with 1T_RAM_HDD.

The hardware design issue limits the total disk bandwidth which makes there is hardly superiority of eight SSDs than four SSDs. Either to enhance hardware by using HBA cards to eliminate the limitation of the design issue for eight SSDs or to mix the storage appropriately. According to the test results, in order to achieve a better balance between performance and cost, using four SSDs and four HDDs can achieve a good performance (102% throughput and 101% latency of eight SSDs) with a much lower price. The RAMDISK/SSD tiered storage is the winner of both throughput and latency among all the tests, so if cost is not an issue and maximum performance is needed, RAMDISK(extremely high speed block device, e.g. NVMe PCI-E SSD)/SSD should be chosen.

You should not use a large number of flusher/compactor when most of data are written to HDD. The read and write shares the single channel per HDD, too many flushers and compactors at the same time can slow down the HDD performance.

During the tests, we found some things that can be improved in both HBase and HDFS.

In HBase, the memstore is consumed quickly when the WALs are stored in fast storage; this can lead to regular long GC pauses. It is better to have an offheap memstore for HBase.

In HDFS, each DataNode shares the same lock when creating/finalizing blocks. Any such slow operations in one DataXceiver can block any other operations of creating/finalizing blocks in other DataXceiver on the same DataNode no matter what storage they are using. We need to eliminate the blocking access across storage, and a finer grained lock mechanism to isolate the operations on different blocks is needed (HDFS-9668). And it will be good to implement a latency-aware VolumeChoosingPolicy in HDFS to remove the slow volumes from the candidates.

RoundRobinVolumeChoosingPolicy can lead to load imbalance in HDFS with tiered storage (HDFS-9608).

In HDFS, renaming a file to a different storage does not move the blocks indeed. We need to asynchronously move the HDFS blocks in such a case.

Acknowledgements

The
authors would like to thank Weihua Jiang – who is the previous manager
of the big data team in Intel – for leading this performance
evaluation, and thank Anoop
Sam John(Intel), Apekshit Sharma(Cloudera), Jonathan Hsieh(Cloudera),
Michael Stack(Cloudera), Ramkrishna S. Vasudevan(Intel), Sean
Busbey(Cloudera) and Uma Gangumalla(Intel) for the nice review and
guidance.