TPC in Apache Phoenix
I often get asked "Does Apache Phoenix support joins?" That's when I know that marketing is hard as Phoenix has supported joins for over a year! In fact, thanks to the excellent work of Maryann Xue, one of our project management committee (PMC) members, Phoenix can run many of the complex TPC queries. TPC benchmarks are used in evaluating the performance of computer systems. From wikipedia:
Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.
Here are some of the query constructs that Phoenix support that you may not know about:
Derived Tables to create on-the-fly views:
SELECT LastName, FirstName FROM ( SELECT BusinessEntityID, LastName, FirstName FROM Employee WHERE State = 'NY' ) AS EmployeeDerivedTable WHERE LastName = "Smith" ORDER BY FirstName;
Correlated Sub-queries to correlate column values in inner and outer queries:
SELECT DISTINCT e.LastName, e.FirstName, e.BusinessEntityID, sp.Bonus FROM Employee AS e JOIN SalesPerson AS sp ON c.BusinessEntityID = sp.BusinessEntityID WHERE c.Bonus >= ( SELECT average(sp2.Bonus) FROM SalesPerson sp2 JOIN Employee AS e2 ON e2.BusinessEntityID = sp2.BusinessEntityID WHERE e.DepartmentID = e2.DepartmentID );
Semi/Anti Joins to test for the existence or nonexistence of a row:
SELECT p.Name, p.Job, p.EmployerID FROM People AS p WHERE EXISTS ( SELECT 1 FROM Employers AS e WHERE e.ID = p.EmployerID );
Union All to concatenate rows from multiple tables (available in 4.4.0 release):
SELECT * FROM Employee e
WHERE e.BusinessEntityID IN
( SELECT BusinessEntityID FROM SalesPerson WHERE Ranking >= 5.0
UNION ALL
SELECT BusinessEntityID FROM CustomerReview WHERE Score >= 8.0 )
Phoenix has support for both a broadcast hash join strategy, when one side of the join is small enough to fit into memory, as well as a sort merge join strategy for cases when it won't. Put all this functionality together, and Phoenix is capable of running many TPC queries.
Here's an example query from one of the TPC queries:
SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue FROM ( SELECT n1.n_name as supp_nation, n2.n_name as cust_nation, year(l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume FROM Supplier JOIN LineItem ON s_suppkey = l_suppkey JOIN Orders ON o_orderkey = l_orderkey JOIN Customer ON c_custkey = o_custkey JOIN Nation n1 ON s_nationkey = n1.n_nationkey JOIN Nation n2 ON c_nationkey = n2.n_nationkey WHERE ( n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]' ) OR (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') ) AND l_shipdate between date '1995-01-01' and date '1996-12-31' ) AS Shipping GROUP BY supp_nation, cust_nation, l_year ORDER BY supp_nation, cust_nation, l_year;
So, please repeat after me: "YES, Phoenix supports joins" :-)
Interested in learning more? Come hear Maryann Xue talk at the upcoming Apache Phoenix meetup on May 5th at the Hortonworks HQ in Santa Clara, or hear our joint talk at HBaseCon 2015 on May 7th where we'll discuss our plan for getting the rest of the way to complete ANSI SQL 92 support.