Robust, Open Source "SQL-on-Hadoop" Big Data warehouse solution now faster, with improved performance and enhanced integration with Apache Hadoop™.

Forest Hill, MD –21 October 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 200 Open Source projects and initiatives, announced today the availability of Apache™ Tajo™ v0.9, the advanced Open Source data warehousing system in Apache Hadoop™.

"With Apache Tajo v0.9, our goal of bringing traditional SQL performance to massive data is a step closer," said Hyunsik Choi, Vice President of Apache Tajo. "We really enjoyed working to improve Tajo's leading-edge native SQL support, and its lightning performance across divergent workloads."

Dubbed an "SQL-on-Hadoop" solution, Apache Tajo is used for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large data sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities. Overall, Apache Tajo v0.9 delivers more powerful native SQL support on an even faster platform.

"We have been determined from the outset to find ways of boosting query processing speed without compromising system robustness and solution accessibility," said Jihoon Son, member of the Apache Tajo Project Management Committee. "In practice, that means using cutting-edge query techniques and processing algorithms as our source of 'speed', meanwhile maintaining three key features: Fault tolerance, the ability to fully utilize working memory and write to disk, and data source neutrality. We think those design choices give Apache Tajo long-run flexibility and coherence."

Features and enhancements in Apache Tajo v0.9 include:

  • More comprehensive and powerful SQL capabilities, such as TIMESTAMP, DATE, TIME, and INTERVAL type support, as well as WINDOW functions, OVER clause support, and multiple distinct aggregation;
  • Performance improvements, such as offheap sort algorithm for ORDER BY and Runtime code generation for evaluating expressions push the boundaries of massive data query speeds;
  • Improvements to the hash shuffle I/O, boosting bottom-line speeds by 200-300% on "heavy", complex queries;
  • Enhanced Hadoop integration, including support for Hadoop 2.2.0 up to Hadoop 2.5.1, and expanded Hive Metastore access;
  • Improved catalog backup and restore feature, as well as accessibility enhancements streamline performance across disparate technology environments.
Apache Tajo is part of the Apache Hadoop ecosystem at a variety of organizations, including Gruter, Korea University, and NASA JPL's Radio Astronomy and Airborne Snow Observatory projects, among others. At SK Telecom, South Korea's largest wireless carrier, Apache Tajo has undergone a brutal testing regimen, where it has had to deal with telco-sized data stores, node growth and cluster expansion, and a grueling company-wide data analysis and reporting schedule. "The fast processing capabilities of Apache Tajo have allowed us to build an entirely new big data warehouse and OLAP system," said Eddy Park, Hadoop-based Data Warehouse Project Manager at SK Telecom. "Apache Tajo now plays a vital role in data-driven decision making at our company."

Hyoungjun Kim, CTO of Gruter, said "We run Apache Tajo in-house on 30 cluster nodes in order to power Seenal, our social network analysis service that supplies social media insight to government and corporate clients. On the one hand, this involves running complex ETL processes on hundreds of gigabytes of data per day in order to detect market and opinion signals. On the other hand, analysts and project teams often need to run very specific analyses on much smaller data sets. Tajo is able to handle the full spectrum of Seenal’s data processing and query needs at high speed and with minimal fuss."

"We're very excited about the release of Apache Tajo 0.9," added Choi. "The Apache Tajo community, committers, and supporters have really done our mission proud."

Availability and Oversight
As with all Apache products, Apache Tajo software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Tajo, visit http://tajo.apache.org/ and https://twitter.com/ApacheTajo

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than two hundred leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 450 individual Members and 4,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

"Apache", "Apache Hadoop", "Hadoop", "Apache Tajo", "Tajo", "ApacheCon", and the Apache Tajo logo are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

# # #