I am very excited to announce that Apache Incubator Samza 0.8.0 has been released. Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. The project entered Apache Incubator in 2013 and was originally created at LinkedIn, where it's in production use. The project is currently under active development from a diverse group of committers. This release builds off of our past 0.7.0 release, and is likely to be our last release as an incubating Apache project before we graduate to a top level project.

A source download of the 0.8.0 release is available here. The release JARs are also available in Apache's Maven repository. See Samza's download page for details.

In all, 136 JIRAs were resolved in this release. Notable work done includes:

  • Made major performance improvements. A single SamzaContainer can process over 1,000,000 messages/sec now. (SAMZA-245)
  • Added RocksDB state management support. (SAMZA-236)
  • Added support for pluggable partition-container assignment strategies. (SAMZA-123)
  • Added support for Java 8 and Gradle 2.0, and dropped support for Scala 2.8 and 2.9. (SAMZA-202)
  • Upgraded YARN support to 2.4.0. (SAMZA-186, SAMZA-58)
  • Several metrics improvements, including adding a new timer metric. (SAMZA-349, SAMZA-407, SAMZA-408)
  • Made Samza's checkpoint topics smaller by taking advantage of Kafka's log compaction feature. (SAMZA-388)
  • Added an in-memory key-value store that can be used in place of RocksDB/LevelDB for small state. (SAMZA-256)
  • Completely overhauled Samza's YARN AM UI to make it much cleaner and more functional. (SAMZA-32)
  • Fixed several usability issues to make configuring JVM properties easier. (SAMZA-276, SAMZA-20, SAMZA-377, SAMZA-109)

We've also made a lot of community progress during this release:

Even after all this work, there's still a lot to be done. In our next release (0.9.0), we're planning to work on:

  • Configuring Samza jobs through a stream. (SAMZA-348)
  • Supporting Scala 2.11. (SAMZA-469)
  • Upgrading Samza's Kafka producer API. (SAMZA-227)
  • Publish container logs to a stream to integrate with ELK. (SAMZA-310)

Now is a great time to get involved. You can start by running through the hello-samza tutorial, signing up for the mailing list, and grabbing some newbie JIRAs.

I'd like to close by thanking everyone who's been involved in the project. It's been a great experience to be involved in this community, and I look forward to its continued growth.