HBase - Who needs a Master?
By Matteo Bertozzi (mbertozzi at apache dot org), HBase Committer and Engineer on the Cloudera HBase Team.
At
first glance, the Apache HBase architecture appears to follow a
master/slave model where the master receives all the requests but the
real work is done by the slaves. This is not actually the case, and in
this article I will describe what tasks are in fact handled by the
master and the slaves.
HBase
provides low-latency random reads and writes on top of HDFS and it’s
able to handle petabytes of data. One of the interesting capabilities in
HBase is Auto-Sharding, which simply means that tables are dynamically distributed by the system when they become too large.
Regions and Region Servers
The basic unit of scalability, that provides the horizontal scalability, in HBase is called Region.
Regions are a subset of the table’s data, and they are essentially a
contiguous, sorted range of rows that are stored together.
Initially,
there is only one region for a table. When regions become too large
after adding more rows, the region is split into two at the middle key, creating two roughly equal halves.
Looking back at the HBase architecture the slaves are called Region Servers.
Each Region Server is responsible to serve a set of regions, and one
Region (i.e. range of rows) can be served only by one Region Server.
The HBase Architecture has two main services: HMaster that is responsible for coordinating Regions in the cluster and execute administrative operations; HRegionServer responsible to handle a subset of the table’s data.
HMaster, Region Assignment and Balancing
The HBase Master coordinates the HBase Cluster and is responsible for administrative operations.
A
Region Server can serve one or more Regions. Each Region is assigned to
a Region Server on startup and the master can decide to move a Region
from one Region Server to another as the result of a load balance
operation. The Master also handles Region Server failures by assigning
the region to another Region Server.
The
mapping of Regions to Region Servers is kept in a system table called
META. By reading META, you can identify which region is responsible for
your key. This means that for read and write operations, the master is
not involved at all, and clients can go directly to the Region Server
responsible to serve the requested data.
Locating a Row-Key: Which Region Server is responsible?
To
put or get a row clients don’t have to contact the master, clients can
contact directly the Region Server that handles the specified row. In
the case of a scan clients can contact directly the set of Region
Servers responsible for handling the set of keys.
To identify the Region Server, the client does a query on the META table.META
is a system table, used to keep track of regions. It contains the
server name and a region identifier comprised of a table name and the
start row-key. By looking at the start-key and the next region start-key
clients are able to identify the range of rows contained in a a
particular region.
The
client keeps a cache for the region locations. This avoids the need for
clients to hit the META table every time an operation on the same
region is issued. In case of a region split or move to another Region
Server (due to balancing, or assignment policies) the client will
receive an exception as response and the cache will be refreshed by
fetching the updated information from the META table.
Since
META is a table like the others, the client has to identify on which
server META is located. The META locations are stored in a ZooKeeper
node on assignment by the Master, and the client reads directly the node
to get the address of the Region Server that contains META.
The
original design was based on BigTable with another table called -ROOT-
containing the META locations and ZooKeeper pointing to it. HBase 0.96
removed that in favor of ZooKeeper only since META cannot be split and
therefore consists of a single region.
Client API - Master and Regions responsibilities
The HBase java client API is composed of two main interfaces.
-
HBaseAdmin:
allows interaction with the “table schema" by
creating/deleting/modifying tables, and it allows interaction with the
cluster by assigning/unassigning regions, merging regions together,
calling for a flush, and so on. This interface communicates with the
Master. -
HTable:
allows the client to manipulate the data of a specified table, by using
get, put, delete and all the other data operations. This interface
communicates directly with the Region Servers responsible for handling
the requested set of keys.
Those
two interfaces have separate responsibilities: HBaseAdmin is only used
to execute admin operations and communicate with the Master while the
HTable is used to manipulate data and communicate with the Regions.
Conclusion
As
we’ve seen in this article, having a Master/Slave architecture does not
mean that each operation goes through the master. To read and write
data the HBase client, in fact, goes directly to the specific Region
Server responsible to handle the row keys for all the data operations (HTable). The Master is used by the client only for table creation, modification and deletion operations (HBaseAdmin).
Although
there exists a concept of a Master, the HBase client does not depend on
it for data operations and the cluster can keep serving data even if
the master goes down.