After downloading an MR3 release, the user can build Hive on MR3 from the source code of two additional components: Tez for MR3 and Hive for MR3. As the MR3 release is built with Java 1.8 (along with Scala 2.11), we assume that Java 1.8 is already installed.

Cloning GitHub repositories

For Tez for MR3, clone the GitHub repository (https://github.com/mr3project/tez-mr3.git) and check out the branch corresponding to the MR3 release.

$ git clone https://github.com/mr3project/tez-mr3.git -b master --single-branch tez-mr3

For Hive for MR3, clone the GitHub repository (https://github.com/mr3project/hive-mr3.git) and check out the branch corresponding to the Hive version.

$ git clone https://github.com/mr3project/hive-mr3.git -b master3 --single-branch hive-mr3

Setting environment variables

Set the following environment variables in env.sh in the MR3 release to specify the directories of the source code.

$ vi mr3-run/env.sh

TEZ_SRC=~/tez-mr3

HIVE3_SRC=~/hive-mr3

For running Hive on MR3 on Cloudera CDH or Amazon EMR, the user should set the environment variable TEZ_USE_MINIMAL to false in env.sh in the MR3 release. Then, without importing Hadoop classes from Maven repositories, Tez for MR3 reuses Hadoop classes installed on the underlying system.

$ vi mr3-run/env.sh

TEZ_USE_MINIMAL=false

Compiling

Because of the compilation dependency between Hive and Tez, the user should rebuild first Tez for MR3 and then Hive for MR3. To compile Tez for MR3, execute tez/compile-tez.sh in the MR3 release. In order to access Amazon S3 (on Amazon EMR or EKS), use an additional option -P aws.

$ mr3-run/tez/compile-tez.sh
$ mr3-run/tez/compile-tez.sh -P aws

To compile Hive for MR3, execute hive/compile-hive.sh in the MR3 release with the following options:

--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.2) (default).
$ mr3-run/hive/compile-hive.sh --hivesrc3

The user can append as many Maven options as necessary to the command. These scripts invoke Maven to compile the source code, and automatically update the local Maven repository as well as hive/hivejar and tez/tezjar directories in the MR3 release. On Hadoop, they also upload the new jar files to HDFS, so the user does not need to execute mr3/upload-hdfslib-mr3.sh and tez/upload-hdfslib-tez.sh in the MR3 release later.