Robert Yokota is a Software Engineer at Yammer.
An earlier version of this post was published here on Robert's blog.
Be sure to also check out the excellent follow on post Graph Analytics on HBase with HGraphDB and Giraph.
HGraphDB: Apache HBase As An Apache TinkerPop Graph Database
The use of graph databases is common among social networking companies. A social network can easily be represented as a graph model, so a graph database is a natural fit. For instance, Facebook has a graph database called Tao, Twitter has FlockDB, and Pinterest has Zen. At Yammer, an enterprise social network, we rely on Apache HBase for much of our messaging infrastructure, so I decided to see if HBase could also be used for some graph modelling and analysis.
Below I put together a wish list of what I wanted to see in a graph database.
- It should be implemented directly on top of HBase.
- It should support the TinkerPop 3 API.
- It should allow the user to supply IDs for both vertices and edges.
- It should allow user-supplied IDs to be either strings or numbers.
- It should allow property values to be of arbitrary type, including maps, arrays, and serializable objects.
- It should support indexing vertices by label and property.
- It should support indexing edges by label and property, specific to a given vertex.
- It should support range queries and pagination with both vertex indices and edge indices.
I did not find a graph database that met all of the above criteria. For instance, Titan is a graph database that supports the TinkerPop API, but it is not implemented directly on HBase. Rather, it is implemented on top of an abstraction layer that can be integrated with Apache HBase, Apache Cassandra, or Berkeley DB as its underlying store. Also, Titan does not support user-supplied IDs. Apache S2Graph Incubating is a graph database that is implemented directly on HBase, and it supports both user-supplied IDs and indices on edges, but it does not yet support the TinkerPop API nor does it support indices on vertices.
This led me to create HGraphDB, a TinkerPop 3 layer for HBase. It provides support for all of the above bullet points. Feel free to try it out if you are interested in using HBase as a graph database.