HDFS HSM and HBase: Cluster setup (Part 2 of 7)
This is part 2 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel)
Cluster Setup
In all, five nodes are used in the testing. Figure 1 shows the topology of these nodes; the services on each node are listed in Table 3.
Figure 1. Cluster topology
-
For HDFS, one node serves as NameNode, three nodes as DataNodes.
-
For HBase, one HMaster is collocated together with NameNode. Three RegionServers are collocated with DataNodes.
-
All the nodes in the cluster are connected to the full duplex 10Gbps DELL N4032 Switch. Network bandwidth for client-NameNode, client-DataNode and NameNode-DataNode is 10Gbps. Network bandwidth between DataNodes is 20Gbps (20Gbps is achieved by network bonding).
Node |
NameNode |
DataNode |
HMaster |
RegionServer |
Zookeeper |
YCSB client |
1 |
√ |
√ |
||||
2 |
√ |
√ |
√ |
|||
3 |
√ |
√ |
√ |
|||
4 |
√ |
√ |
√ |
|||
5 |
√ |
Table 3. Services run on each node
Hardware
Hardware configurations of the nodes in the cluster is listed in the following tables.
Item |
Model/Comment |
CPU |
Intel® Xeon® CPU E5-2695 v3 @ 2.3GHz, dual sockets |
Memory |
Micron 16GB DDR3-2133MHz, 384GB in total |
NIC |
Intel 10-Gigabit X540-AT2 |
SSD |
Intel S3500 800G |
HDD |
Seagate Constellation™ ES ST2000NM0011 2T 7200RPM |
RAMDISK |
300GB |
Table 4. Hardware for DataNode/RegionServer
Note: OS is stored on an independent SSD (Intel® SSD DC S3500 240GB) for both DataNodes and NameNode. The number of SSD or HDD (OS SSD not included) in DataNode varies for different testing cases., See Section ‘Methodology’ for details.
Item |
Model/Comment |
CPU |
Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz, dual sockets |
Memory |
Micron 16GB DDR3-2133MHz, 260GB in total |
NIC |
Intel 10-Gigabit X540-AT2 |
SSD |
Intel S3500 800G |
Table 5. Hardware for NameNode/HBase MASTER
Item |
Value |
Intel Hyper-Threading Tech |
On |
Intel Virtualization |
Disabled |
Intel Turbo Boost Technology |
Enabled |
Energy Efficient Turbo |
Enabled |
Table 6. Processor configuration
Software
Software |
Details |
OS |
CentOS release 6.5 |
Kernel |
2.6.32-431.el6.x86_64 |
Hadoop |
2.6.0 |
HBase |
1.0.0 |
Zookeeper |
3.4.5 |
YCSB |
0.3.0 |
JDK |
jdk1.8.0_60 |
JVM Heap |
NameNode: 32GB |
DataNode: 4GB |
|
HMaster: 4GB |
|
RegionServer: 64GB |
|
GC |
G1GC |
Table 7. Software stack version and configuration
NOTE: as mentioned in Methodology, HDFS and HBase have been enhanced to support this test
Benchmarks
We use YCSB 0.3.0 as the benchmark and use one YCSB client in the tests.
This is the workload configuration for 1T dataset:
# cat ../workloads/1T_workload fieldcount=5 fieldlength=200 recordcount=1000000000 operationcount=0 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0 updateproportion=0 scanproportion=0 insertproportion=0 requestdistribution=zipfian |
And we use following command to start the YCSB client:
./ycsb load hbase-10 -P ../workloads/1T_workload -threads 200 -p columnfamily=family -p clientbuffering=true -s > 1T_workload.dat |
Go to part 3, Tuning