This is part 1 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel)

  1. Introduction
  2. Cluster Setup
  3. Tuning
  4. Experiment
  5. Experiment (continued)
  6. Issues
  7. Conclusions

Introduction

As more and more fast storage types (SSD, NVMe SSD, etc.) emerge, a methodology is necessary for better throughput and latency when using big data. However, these fast storage types are still expensive and are capacity limited. This study provides a guide for cluster setup with different storage media.

In general, this guide considers the following questions:

  1. What is the maximum performance user can achieve by using fast storage?

  2. Where are the bottlenecks?

  3. How to achieve the best balance between performance and cost?

  4. How to predict what kind of performance a cluster can have with different storage combinations?

In this study, we study the HBase write performance on different storage media. We leverage the hierarchy storage management support in HDFS to store different categories of HBase data on different media.

Three different types of storage (HDD, SSD and RAMDISK) are evaluated. HDD is the most popular storage in current usages, SATA SSD is a faster storage which is more and more popular now. RAMDISK is used to emulate the extremely high performance PCI-e NVMe based SSDs and coming faster storage (e.g. Intel 3D XPoint® based SSD). Due to hardware unavailability, we have to use RAMDISK to perform this emulation. And we believe our results hold for PCI-e SSD and other fast storage types.

Note: RAMDISK is logical device emulated with memory. Files stored into RAMDISK will only be cached in memory.

Methodology

We test the write performance in HBase with a tiered storage in HDFS and compare the performance when storing different HBase data into different storages. YCSB (Yahoo! Cloud Serving Benchmark, a widely used open source framework for evaluating the performance of data-serving systems) is used as the test workload.

Eleven test cases are evaluated in this study.  We split the table into 210 regions in 1 TB dataset cases to avoid region split at runtime, and we pre-split the table into 18 regions in 50 GB dataset cases.

The format of case names is _.

Case Name

Storage

Dataset

Comment

50G_RAM

RAMDISK

50GB

Store all the files in RAMDISK. We have to limit data size to 50GB in this case due to the capacity limitation of RAMDISK.

50G_SSD

8 SSDs

50GB

Store all the files in SATA SSD. Compare the performance by 50GB data with the 1st case.

50G_HDD

8 HDDs

50GB

Store all the files in HDD.

1T_SSD

8 SSDs

1TB

Store all files in SATA SSD. Compare the performance by 1TB data with cases in tiered storage.

1T_HDD

8 HDDs

1TB

Store all files in HDD. Use 1TB in this case.

1T_RAM_SSD

RAMDISK

8 SSDs

1TB

Store files in a tiered storage (i.e. different storage mixed together in one test), WAL is stored in RAMDISK, and all the other files are stored in SSD.

1T_RAM_HDD

RAMDISK

8 HDDs

1TB

Store files in a tiered storage, WAL is stored in RAMDISK, and all the other files are stored in HDD.

1T_SSD_HDD

4 SSDs

4 HDDs

1TB

Store files in a tiered storage, WAL is stored in SSD, some smaller files (not larger than 1.5GB) are stored in SSD, and all the other files are stored in HDD including all archived files.

1T_SSD_All_HDD

4 SSDs

4 HDDs

1TB

Store files in a tiered storage, WAL and flushed HFiles are stored in SSD, and all the other files are stored in HDD including all archived files and compacted files.

1T_RAM_SSD_HDD

RAMDISK

4 SSDs

4 HDDs

1TB

Store files in a tiered storage, WAL is stored in RAMDISK, some smaller files (not larger than 1.5GB) are stored in SSD, and all the other files are stored in HDD including all archived files.

1T_RAM_SSD_All_HDD

RAMDISK

4 SSDs

4 HDDs

1TB

Store files in a tiered storage, WAL is stored in RAMDISK, flushed HFiles are stored in SSD, and all the other files are stored in HDD including all archived files and compacted files.

Table 1. Test Cases

NOTE: In all 1TB test cases, we pre-split the HBase table into 210 regions to avoid the region split at runtime.

The metrics in Table 2 are collected during the test for performance evaluation.

Metrics

Comment

Storage media level

IOPS, throughput (sequential/random R/W)

OS level

CPU usage, network IO, disk IO, memory usage

YCSB Benchmark level

Throughput, latency

Table 2. Metrics collected during the tests

Go to Part 2, Cluster Setup