I often get asked "Does Apache Phoenix support joins?" That's when I know that marketing is hard as Phoenix has supported joins for over a year! In fact, thanks to the excellent work of Maryann Xue, one of our project management committee (PMC) members, Phoenix can run many of the complex TPC queries. TPC benchmarks are used in evaluating the performance of computer systems. From wikipedia:

Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.  

Here are some of the query constructs that Phoenix support that you may not know about:

Derived Tables to create on-the-fly views:

  SELECT LastName, FirstName
      ( SELECT BusinessEntityID, LastName, FirstName FROM Employee
        WHERE State = 'NY' ) AS EmployeeDerivedTable
    WHERE LastName = "Smith"
    ORDER BY FirstName;

Correlated Sub-queries to correlate column values in inner and outer queries:

  SELECT DISTINCT e.LastName, e.FirstName, e.BusinessEntityID, sp.Bonus
  FROM Employee AS e
  JOIN SalesPerson AS sp ON c.BusinessEntityID = sp.BusinessEntityID
  WHERE c.Bonus >=
  ( SELECT average(sp2.Bonus)
    FROM SalesPerson sp2
    JOIN Employee AS e2 ON e2.BusinessEntityID = sp2.BusinessEntityID
    WHERE e.DepartmentID = e2.DepartmentID );

Semi/Anti Joins to test for the existence or nonexistence of a row:

  SELECT p.Name, p.Job, p.EmployerID
  FROM People AS p
  WHERE EXISTS ( SELECT 1 FROM Employers AS e WHERE e.ID = p.EmployerID );

Union All to concatenate rows from multiple tables (available in 4.4.0 release):

  SELECT * FROM Employee e
  WHERE e.BusinessEntityID IN
  ( SELECT BusinessEntityID FROM SalesPerson WHERE Ranking >= 5.0
    SELECT BusinessEntityID FROM CustomerReview WHERE Score >= 8.0 ) 

Phoenix has support for both a broadcast hash join strategy, when one side of the join is small enough to fit into memory, as well as a sort merge join strategy for cases when it won't. Put all this functionality together, and Phoenix is capable of running many TPC queries.

Here's an example query from one of the TPC queries:

  SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue
    n1.n_name as supp_nation, n2.n_name as cust_nation,
    year(l_shipdate) as l_year, 
    l_extendedprice * (1 - l_discount) as volume
    FROM Supplier
    JOIN LineItem ON s_suppkey = l_suppkey
    JOIN Orders ON o_orderkey = l_orderkey
    JOIN Customer ON c_custkey = o_custkey
    JOIN Nation n1 ON s_nationkey = n1.n_nationkey
    JOIN Nation n2 ON c_nationkey = n2.n_nationkey
    WHERE ( n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]' )
       OR (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') )
       AND l_shipdate between date '1995-01-01' and date '1996-12-31'
  ) AS Shipping
  GROUP BY supp_nation, cust_nation, l_year
  ORDER BY supp_nation, cust_nation, l_year;

So, please repeat after me: "YES, Phoenix supports joins" :-)

Interested in learning more? Come hear Maryann Xue talk at the upcoming Apache Phoenix meetup on May 5th at the Hortonworks HQ in Santa Clara, or hear our joint talk at HBaseCon 2015 on May 7th where we'll discuss our plan for getting the rest of the way to complete ANSI SQL 92 support.