We are very excited to announce the release of Apache Samza 0.14.0
Samza has been powering real-time applications in production across several large companies (including LinkedIn, Netflix, Uber, Slack, Redfin, TripAdvisor, etc) for years now. Samza provides leading support for large-scale stateful stream processing with:

  • First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.
  • Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.
  • A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).
  • A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.
  • High level API for expressing complex stream processing pipelines in a few lines of code.
  • Flexible deployment model for running the the applications in any hosting environment and with cluster managers other than YARN.
  • Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.

New Features, Upgrades and Bug Fixes

The 0.14.0 release contains the following highly anticipated features:

  • Samza SQL
  • Azure EventHubs producer, consumer and checkpoint provider
  • AWS Kinesis consumer

This release also includes improvements such as durable state in high-level API, Zookeeper-based deployment stability, and multi-stage batch processing, and bug fixes such as KafkaSystemProducer concurrent sends and flushes.

Overall, 65 JIRAs were resolved in this release. For more details about this release, please check out the release notes.

Community Developments

We’ve made great community progress since the last release (0.13.1). We presented the unified data processing with Samza at the 2017 Big Data conference held in Spain and the Dataworks Summit in Sydney, and held a demo at @scale conference in San Jose. Here are the details to these conferences.

In Dec 4th, we held the meetup for Stream Processing with Apache Kafka & Apache Samza, which has the following presentations for Samza:

As future development, we are continuing working on improvements to the new High Level API, SQL, Stream-Table Join and flexible deployment features.

Contribute

It’s a great time to get involved. You can start by reviewing the tutorials, signing up for the mailing list, and grabbing some newbie JIRAs.
I’d like to close by thanking everyone who’s been involved in the project. It’s been a great experience to be involved in this community, and I look forward to its continued growth.