Announcing the release of Apache Samza 0.10.0
I am very excited to announce that the much awaited Apache Samza 0.10.10 has been released. This is our third release as an Apache Top-level Project. Samza is a distributed stream processing framework that uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. The project graduated from Apache Incubator early this year in January. It was originally created at LinkedIn and still continues to be used in production. The project is currently under active development with contributions from a diverse group of contributors and commiters. Since the last release in July 2015, there has been a significant increase in the adoption of Samza in the industry (e.g. Samza is in production in Uber and Netflix. see PoweredBy).
A source download of the 0.10.0 release is available here. The release JARs are also available in Apache's Maven repository. See Samza's download page for details.
Overall, 130 JIRAs were resolved in this release. A few highlights:
- Introduced Coordinator Stream to support large and dynamic configuration in a Samza job (SAMZA-348), along with a command-line tool to write to the Coordinator Stream (SAMZA-704)
- Added support for Broadcast Stream (SAMZA-676)
- Implemented host-affinity feature in Yarn for more robust recovery of stateful jobs (SAMZA-617)
- Upgraded RocksDB JNI version to 3.13.1 (SAMZA-747), along with support for TTL (SAMZA-537)
- Introduced HDFS producer (SAMZA-693) and ElasticSearch (SAMZA-654) producer, to allow writing directly from Samza to HDFS stores and ElasticSearch respectively
- Implemented tools to better support troubleshooting of RocksDB stores in the job (SAMZA-598)
- Fixed some performance and stability issues that got introduced (SAMZA-798, SAMZA-754, SAMZA-723)
Known issues in this release:
- Negative RocksDB TTL is not handled properly (SAMZA-838)
- Slow start of Samza jobs with large number of containers (SAMZA-843)
- Incompatible change in Kafka producer that does not honor custom partitioners (SAMZA-839)
We've also made a lot of community progress during this release:
- Added 3 more companies in the powered by page (Uber, State.com, Netflix)
- 2 Successful meetups were held - one in July and the other in October
- Accepted patches from 37 distinct contributors
- 917 emails sent to the developer mailing list in past 3 months
There are a lot to exciting features to expect in our future release. Some of them are:
- Samza standalone mode (SAMZA-516)
- Samza intergration for Amazon Kinesis (SAMZA-489)
- Expose more RocksDB Stats (SAMZA-449)
- Support static partition assignment (SAMZA-41)
Starting 0.10.0 release, Samza will require java 1.7+ and Yarn 2.6.1+.
It’s a great time to get involved. You can start by running through the hello-samza tutorial, signing up for the mailing list, and grabbing some newbie JIRAs.
I'd like to close by thanking everyone who's been involved in the project. It's been a great experience to be involved in this community, and I look forward to its continued growth.