After downloading an MR3 release, the user can build Hive on MR3 from the source code of two additional components: Tez for MR3 and Hive for MR3. As the MR3 release is built with Java 1.8 (along with Scala 2.11), we assume that Java 1.8 is already installed.

Cloning GitHub repositories

For Tez for MR3, clone the GitHub repository (https://github.com/mr3project/tez-mr3.git) and check out the branch corresponding to the MR3 release. For example, if the user downloads MR3 release base2.7, check out branch base2.7.

$ git clone https://github.com/mr3project/tez-mr3.git -b base2.7 --single-branch tez-mr3
$ git clone https://github.com/mr3project/tez-mr3.git -b base2.8 --single-branch tez-mr3
$ git clone https://github.com/mr3project/tez-mr3.git -b master --single-branch tez-mr3

For Hive for MR3, clone the GitHub repository (https://github.com/mr3project/hive-mr3.git) and check out the branch corresponding to the Hive version. For Hive 3, use branch master3-base2.8 for MR3 releases base2.7 and base2.8, and branch master3for MR3 release master.

$ git clone https://github.com/mr3project/hive-mr3.git -b master2 --single-branch hive-mr3
$ git clone https://github.com/mr3project/hive-mr3.git -b master3-base2.8 --single-branch hive-mr3
$ git clone https://github.com/mr3project/hive-mr3.git -b master3 --single-branch hive-mr3
$ git clone https://github.com/mr3project/hive-mr3.git -b master4 --single-branch hive-mr3

Setting environment variables

Set the following environment variables in env.sh in the MR3 release to specify the directories of the source code.

$ vi mr3-run/env.sh

TEZ_SRC=~/tez-mr3

HIVE2_SRC=~/hive-mr3
HIVE3_SRC=~/hive-mr3
HIVE4_SRC=~/hive-mr3

For running Hive on MR3 on Cloudera CDH or Amazon EMR, the user should set the environment variable TEZ_USE_MINIMAL to false in env.sh in the MR3 release. Then, without importing Hadoop classes from Maven repositories, Tez for MR3 reuses Hadoop classes installed on the underlying system.

$ vi mr3-run/env.sh

TEZ_USE_MINIMAL=false

Compiling

Because of the compilation dependency between Hive and Tez, the user should rebuild first Tez for MR3 and then Hive for MR3. To compile Tez for MR3, execute tez/compile-tez.sh in the MR3 release. In order to access Amazon S3 (on Amazon EMR or EKS), use an additional option -P aws.

$ mr3-run/tez/compile-tez.sh
$ mr3-run/tez/compile-tez.sh -P aws

To compile Hive for MR3, execute hive/compile-hive.sh in the MR3 release with the following options:

--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.6).
--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.2) (default).
--hivesrc4                # Choose hive4-mr3 (based on Hive 4.0.0-SNAPSHOT).
$ mr3-run/hive/compile-hive.sh --hivesrc2
$ mr3-run/hive/compile-hive.sh --hivesrc3
$ mr3-run/hive/compile-hive.sh --hivesrc4

The user can append as many Maven options as necessary to the command. These scripts invoke Maven to compile the source code, and automatically update the local Maven repository as well as hive/hivejar and tez/tezjar directories in the MR3 release. On Hadoop, they also upload the new jar files to HDFS, so the user does not need to execute mr3/upload-hdfslib-mr3.sh and tez/upload-hdfslib-tez.sh in the MR3 release later.