To get a feel for the need that bigtop packaging of hadoop components is all about, I suggest checking out Roman's puppetcon bigtop talk a few years back. 

The thrust of this talk is that that we need to bring the uniformity to the hadoop ecosystem, and ease of use for end users of hadoop. To me an important first step down this path, is bringing the Java community in-line with what packaging is really all about and why it makes it easier to maintain complex systems. 

As a Java/Maven guy, wrapping my head around "packaging" has been a little tricky... And according to stephen r. covey, change begins on the inside :).   

How would YOU package hadoop as an RPM ?

The thought of this is pretty daunting, and its really interesting to see how this is solved in bigtop.  I've begin documenting my current adventures into the world of RPMs, packaging, and BigTop.  I've just begin to scratch the surface of all of the services, users, binaries, and security features associated with a basic RPM hadoop installation, and it will probably be a while before I fully understand how it all really works. 

So in the meanwhile, lets learn about hadoop packaging with a simpler project... Apache Mahout.

Here are the packaging resources for mahout inside of bigtop:

common/mahout/
├── do-component-build
└── install_mahout.sh

...
bigtop-packages/rpm/mahout/SPECS/mahout.spec

Above you can see that there are three main components to packaging of mahout.

1) The "do-component-build" file.

2) The "install_mahout.sh" file.

3) The rpm file "mahout.spec", which actually uses these two components to do its work.

The do-component-build builds the raw mahout artifact directly from source.  You can see the java specific details of mahout compilation in there. 


set -ex

. `dirname $0`/bigtop.bom

mvn clean install -Dmahout.skip.distribution=false -DskipTests -Dhadoop2.version=$HADOOP_VERSION "$@"
mkdir build
for i in distribution/target/mahout*.tar.gz ; do
  tar -C build --strip-components=1 -xzf $i
done


Meanwhile, install_mahout.sh contains the actual logic of how and where mahout jars will go, and a snippet that writes out the mahout startup shell script /usr/bin/mahout.


# Copy in the /usr/bin/mahout wrapper
install -d -m 0755 $PREFIX/$BIN_DIR
cat > $PREFIX/$BIN_DIR/mahout <

#!/bin/bash

# Autodetect JAVA_HOME if not defined
. /usr/lib/bigtop-utils/bigtop-detect-javahome

# FIXME: MAHOUT-994
export HADOOP_HOME=\${HADOOP_HOME:-/usr/lib/hadoop}
export HADOOP_CONF_DIR=\${HADOOP_CONF_DIR:-/etc/hadoop/conf}

export MAHOUT_HOME=\${MAHOUT_HOME:-$INSTALLED_LIB_DIR}
export MAHOUT_CONF_DIR=\${MAHOUT_CONF_DIR:-$CONF_DIR}
# FIXME: the following line is a workaround for BIGTOP-259
export HADOOP_CLASSPATH="`echo /usr/lib/mahout/mahout-examples-*-job.jar`":\$HADOOP_CLASSPATH
exec $INSTALLED_LIB_DIR/bin/mahout "\$@"
EOF
chmod 755 $PREFIX/$BIN_DIR/mahout


Anyways, hope this quick tour helps those who are trying to get involved with the bigtop packaging process.  It took me a few days to understand how it all works, because after all, packaging software is an intrinsically complex task.  But thankfully, there are TONS of examples of how to package all the different players of the hadoop ecosystem underneath bigtop-packages/src which can easily help you get started.