We are thrilled to announce the release of Apache Samza 1.3.0

Today Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, VMWare, Slack, Redfin among many others.
This release of Samza adds a variety of features and capabilities to Samza’s existing arsenal, coupled with improved documentation, code snippets, examples.
Samza provides leading support for large-scale stateful stream processing with:

  • First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.
  • Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.
  • A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.
  • High level API for expressing complex stream processing pipelines in a few lines of code.
  • Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model.
  • A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).
  • A Table API that provides a common abstraction for accessing remote or local databases and allowing developers are able to "join" an input event stream with such a Table.
  • Flexible deployment model for running the the applications in any hosting environment and with cluster managers other than YARN.
  • Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.

New Features, Upgrades and Bug Fixes

This release brings the following features, upgrades, and capabilities (highlights):

  • Startpoint support improvement
  • Samza SQL improvement
  • Table API improvement
  • Miscellaneous bug fixes

Full list of the jiras addressed in this release can be found here.

Startpoint support improvement

  • SAMZA-2201 Startpoints - Integrate fan out with job coordinators
  • SAMZA-2215 StartpointManager fix for previous CoordinatorStreamStore refactor
  • SAMZA-2220 Startpoints - Fully encapsulate resolution of starting offsets in OffsetManager

Samza SQL improvement

  • SAMZA-2234 Samza SQL : Provide access to Samza context to the Sama SQL UDFs
  • SAMZA-2313 Samza-sql: Add validation for Samza sql statements
  • SAMZA-2354 Improve UDF discovery in samza-sql

Table API improvement

  • SAMZA-2191 support batching for remote tables
  • SAMZA-2200 Update table sendTo() and join() operation to accept additional arguments
  • SAMZA-2219 Add a dummy table read function
  • SAMZA-2309 Remote table descriptor requires read function

Miscellaneous bug fixing

  • SAMZA-2198 Containers process always takes task.shutdown.ms to shut down
  • SAMZA-2293 Propagate the watermark future to StreamOperatorTask correctly

Important Announcement

We may introduce a backward incompatible changes regarding samza job submission in the future 1.4 release. Details can be found on SEP-23: Simplify Job Runner

Sources downloads

A source download of Samza 1.3.0 is available here, and is also available in Apache’s Maven repository. Samza’s download page for details and Samza’s feature preview for new features.

Contribute

It’s a great time to get involved. You can start by reviewing the tutorials, signing up for the mailing list, and grabbing some newbie JIRAs.