In June 2012, we released Apache Oozie (incubating) 3.2.0. Oozie is currently undergoing incubation at The Apache Software Foundation (see http://incubator.apache.org/oozie).

Oozie is a workflow scheduler system for Apache Hadoop jobs. Oozie Workflows are Directed Acyclical Graphs (DAGs), and they can be scheduled to run at a given time frequency and when data becomes available in HDFS.

Oozie 3.1.3 was the first incubating release. Oozie 3.1.3 added Bundle job capabilities to Oozie. A bundle job is a collection of coordinator jobs that can be managed as a single application. This is a key feature for power users that need to run complex data-pipeline applications.

Oozie 3.2.0 is the second incubating release, and the first one to include features and fixes done in the context of the Apache Community. The Apache Oozie Community is growing organically with more users, more contributors, and new committers. Speaking as one of the initial developers of Oozie, it is exciting and fulfilling to see the Apache Oozie project gaining traction and mindshare.

While Oozie 3.2.0 is a minor upgrade, it adds significant new features and fixes that make the upgrade worthwhile. Here are the most important new features:

  • Support for Hadoop 2 (YARN Map-Reduce)
  • Built in support for new workflow actions: Hive, Sqoop, and Shell
  • Kerberos SPNEGO authentication for Oozie HTTP REST API and Web UI
  • Support for proxy-users in the Oozie HTTP REST API (equivalent to Hadoop proxy users)
  • Job ACLs support (equivalent to Hadoop job ACLs)
  • Tool to create and upgrade Oozie database schema (works with Derby, MySQL, Oracle, and PostgreSQL databases)
  • Improved Job information over HTTP REST API
  • New Expression Language functions for Workflow and Coordinator applications
  • Share library per action (including only the JARs required for the specific action)

Oozie 3.2.0 also includes several improvements for performance and stability, as well as bug fixes. And, as with previous Oozie releases, we are ensuring 100% backwards compatibility with applications written for previous versions of Oozie.

At the Hadoop Summit 2012 in San Jose, an Oozie meet-up gathering occurred. It was very nice to meet new people and to match faces to familiar email addresses and IRC IDs. During the meet-up, Michelle Chiang from Yahoo! Oozie QE team- explained the comprehensive certification process that Yahoo has in place for Oozie, which includes reliability, scalability, and compatibility tests, some of which run for 7 days. Mona Chitnis from Yahoo! Oozie Engineering team described how the integration with Hadoop 2 (YARN Map-Reduce) was done, the challenges that were faced, and how we worked together with the Apache Hadoop community to accomplish this goal. Finally, we discussed current Oozie pain points and ideas on how to address them. Already there is activity in the Apache Oozie JIRA on this front.

We are already working at full speed on new features and fixes to make Oozie easier to use and I'm personally thrilled to see how Oozie is helping manage complex processing in Hadoop clusters.

If you need additional information, please feel free to drop an email on the project’s user or developer lists, or alternatively file the appropriate JIRA issues. Your contribution in any form is welcome on the project.