Testing Apache Tez with Apache BigTop's new Testing Infrastructure
Apache BigTop's utilities can be consumed and reused by any hadoop distribution, not just itself. Puppet recipes, RPM specifications, and so on, can save vendors weeks or months of time if borrowed from BigTop rather than maintained in house. However, until recently, the tests were somewhat difficult to customize and hack on.
So, after alot of stimulating debate, we finally settled on our new test infrastructure. For those interested in building an integration test framework, especially a Java based application, where Java/Scala/Groovy based API calls will be important to run in an integration context, gradle based tests can be very powerful.
- You can organize gradle source trees easily, without any requirement for complex package hierarchies.
- You can dynamically add source sets without alot of boiler plate, meaning the tests can easily by extended and hacked by new engineers, devops folks, etc.
- You can still test low level java functionality easily, by adding java libraries to the classpath at runtime, without needing to compile jars and manage a whole maven style project.
- The test interface is easy to customize with arguments. You can parse arguments however you want.
- Gradle combines the power of groovy into a declarative language for builds
- Using something like gradle-wrapper, you can make your java based tests easy to consume by anyone, even folks outside the java community.
As an example of how to use gradle for integration tests, I'll demonstrate how we retooled the BigTop tests.
You can check out the new tests by cloning bigtop, and going into the bigtop-tests/smoke-tests directory.
First lets take a look at the overall directory structure of the testing suite.
[bigtop@sandbox smoke-tests]$ tree
├── build.gradle
├── flume
│ ├── build.gradle
│ ├── conf
│ │ └── flume.conf
│ ├── log4j.properties
│ └── TestFlumeNG.groovy
├── hive
│ ├── build.gradle
│ └── log4j.properties
├── mahout
│ ├── build.gradle
│ └── log4j.properties
├── mapreduce
│ └── build.gradle
Testing Apache Tez with Apache BigTop
So, what better way to demonstrate the flexibility of the BigTop testing suite than to use it to test another tool, native to another distribution : Apache Tez on Hortonworks HDP !
The code for these tests is in this jira which also has the patch to add a simple Tez test to bigtop attached to it.
How it works
Its pretty simple... Above we can see that each ecosystem component has a "build.gradle" file. The build.gradle file contains a few dependencies, and the names of classes which it will be calling for tests. There is also a top-level build.gradle file. The job of this file is to send global parameters to the sub tests, it does no testing of its own. We do this using the "subprojects" directive. Finally, there is a settings.gradle file, that parses our input arguments to decide which tests to run.
So, how do we extend these tests? Easy !
- Pick any existing test as a template (for example, pig/) and just copy the files into a new directory.
- Create a directory, for example "tez/"
- Customize the environment variables you want defined for your test in build.gradle
- Customize the unit testing script (which uses itest and junit for assertions and running bash commands)
- Run your new tests : gradle clean compileGroovy -Dsmoke.tests=tez --info
There it is : In slightly below 100 lines of code, by adding two simple files, we were able to add a new test the bigtop test suite. Note that we didn't have to edit a single existing file to run this test, rather, we just dumped some groovy scripts into a new directory, and gradle discovered, ran the tests for us, and created a nice little html report as well, which is now available in ./build/reports/tests/index.html. Gradle also injected the inherited dependencies for us, and did some basic sanity checking as well.
We can see that, the original test ram a MapReduce job - but after turning tez on, indeed our job UI shows that we can now test Tez using BigTop's test framework.