Giraph 1.2.0 release is out.

With Giraph 1.2.0 we’ve released a lot of new and exciting functionality. We've built a new API to simplify application development, we've added out of core support for cases when data doesn't fit into memory and, of course, we made lots of performance improvements and bug fixes. Before you rush to download and install the new bits read on so you know what to play with.

Blocks Framework

Blocks Framework is a new improved API for writing Giraph applications. It enables developers to easily organize code to have better encapsulation, reusability, and flexibility. With blocks API developers can build generic libraries on top of Apache Giraph. Utilities to transform graphs or implementations of algorithms like PageRank or Connected Components can now be reused across multiple applications. We also share a reusable library including many known algorithms implemented in blocks API. For more details, see: http://giraph.apache.org/blocks.html

New out of core

Out-of-core feature in Giraph enables users to run jobs on graphs much bigger than it was previously possible. In the normal execution of a Giraph job, Giraph used to assume the data entirely fits in memory. This assumption caused some jobs to fail due to ever-growing amount of data and also the fact that continuous increase in the number of machines running a Giraph job is not always the best solution. The out-of-core feature enables Giraph to use storage devices (local disk or even external storage devices over the network) to store excess data that does not fit in memory. The out-of-core feature helps Giraph to scale out graph processing jobs to much larger graphs. Previous attempts implementing out-of-core feature in Giraph were either inefficient or incorrect. In this release we have redesigned the out-of-core feature to be more efficient, cost effective, and extensible. Now, developers can also add their own out-of-core strategy for each class of applications they want to use Giraph for.

Facebook configuration

In Giraph 1.2.0, we have made it easy to specify bulk parameter configurations. These can be statically defined as Java classes and enabled by passing an appropriate command line parameter. We have also made public a set of parameters used internally at Facebook, which provide substantial performance benefit comparing to the default options. You can enable this configuration when you launch a Giraph job simply by adding to your command the option:

-Ddigraph.block_factory_configurators="org.apache.giraph.conf.FacebookConfiguration"

Performance and reliability

Finally, this release has lots of bug fixes and performance improvements. We're constantly trying to make Giraph better and faster. For a full list of fixes and improvements, see https://s.apache.org/0dMa