This is part 2 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel)

  1. Introduction
  2. Cluster Setup
  3. Tuning
  4. Experiment
  5. Experiment (continued)
  6. Issues
  7. Conclusions

Cluster Setup

In all, five nodes are used in the testing. Figure 1 shows the topology of these nodes; the services on each node are listed in Table 3.

Figure 1. Cluster topology

  1. For HDFS, one node serves as NameNode, three nodes as DataNodes.

  2. For HBase, one HMaster is collocated together with NameNode. Three RegionServers are collocated with DataNodes.

  3. All the nodes in the cluster are connected to the full duplex 10Gbps DELL N4032 Switch. Network bandwidth for client-NameNode, client-DataNode and NameNode-DataNode is 10Gbps. Network bandwidth between DataNodes is 20Gbps (20Gbps is achieved by network bonding).

Node

NameNode

DataNode

HMaster

RegionServer

Zookeeper

YCSB client

1

2

3

4

5

Table 3. Services run on each node

Hardware

Hardware configurations of the nodes in the cluster is listed in the following tables.

Item

Model/Comment

CPU

Intel® Xeon® CPU E5-2695 v3 @ 2.3GHz, dual sockets

Memory

Micron 16GB DDR3-2133MHz, 384GB in total

NIC

Intel 10-Gigabit X540-AT2

SSD

Intel S3500 800G

HDD

Seagate Constellation™ ES ST2000NM0011 2T 7200RPM

RAMDISK

300GB

Table 4. Hardware for DataNode/RegionServer

Note: OS is stored on an independent SSD (Intel® SSD DC S3500 240GB) for both DataNodes and NameNode. The number of SSD or HDD (OS SSD not included) in DataNode varies for different testing cases., See Section ‘Methodology’ for details.

Item

Model/Comment

CPU

Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz, dual sockets

Memory

Micron 16GB DDR3-2133MHz, 260GB in total

NIC

Intel 10-Gigabit X540-AT2

SSD

Intel S3500 800G

Table 5. Hardware for NameNode/HBase MASTER

Item

Value

Intel Hyper-Threading Tech

On

Intel Virtualization

Disabled

Intel Turbo Boost Technology

Enabled

Energy Efficient Turbo

Enabled

Table 6. Processor configuration

Software

Software

Details

OS

CentOS release 6.5

Kernel

2.6.32-431.el6.x86_64

Hadoop

2.6.0

HBase

1.0.0

Zookeeper

3.4.5

YCSB

0.3.0

JDK

jdk1.8.0_60

JVM Heap

NameNode:      32GB

DataNode:          4GB

HMaster:            4GB

RegionServer:  64GB

GC

G1GC

Table 7. Software stack version and configuration

NOTE: as mentioned in Methodology, HDFS and HBase have been enhanced to support this test

Benchmarks

We use YCSB 0.3.0 as the benchmark and use one YCSB client in the tests.

This is the workload configuration for 1T dataset:

# cat ../workloads/1T_workload

fieldcount=5

fieldlength=200

recordcount=1000000000

operationcount=0

workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0

updateproportion=0

scanproportion=0

insertproportion=0

requestdistribution=zipfian

And we use following command to start the YCSB client:

./ycsb load hbase-10 -P ../workloads/1T_workload -threads 200 -p columnfamily=family -p clientbuffering=true -s > 1T_workload.dat

Go to part 3, Tuning