After downloading an MR3 release, the user can build Hive on MR3 from the source code of two additional components: Tez for MR3 and Hive for MR3. As the MR3 release is built with Java 1.8 (along with Scala 2.11), we assume that Java 1.8 is already installed.
Cloning GitHub repositories
For Tez for MR3, clone the GitHub repository (https://github.com/mr3project/tez-mr3.git) and check out the branch corresponding to the MR3 release.
$ git clone https://github.com/mr3project/tez-mr3.git -b master --single-branch tez-mr3
For Hive for MR3, clone the GitHub repository (https://github.com/mr3project/hive-mr3.git) and check out the branch corresponding to the Hive version.
$ git clone https://github.com/mr3project/hive-mr3.git -b master3 --single-branch hive-mr3
Setting environment variables
Set the following environment variables in
env.sh in the MR3 release
to specify the directories of the source code.
$ vi mr3-run/env.sh TEZ_SRC=~/tez-mr3 HIVE3_SRC=~/hive-mr3
For running Hive on MR3 on Cloudera CDH or Amazon EMR,
the user should set the environment variable
TEZ_USE_MINIMAL to false in
env.sh in the MR3 release.
Then, without importing Hadoop classes from Maven repositories,
Tez for MR3 reuses Hadoop classes installed on the underlying system.
$ vi mr3-run/env.sh TEZ_USE_MINIMAL=false
Because of the compilation dependency between Hive and Tez,
the user should rebuild first Tez for MR3 and then Hive for MR3.
To compile Tez for MR3, execute
tez/compile-tez.sh in the MR3 release.
In order to access Amazon S3 (on Amazon EMR or EKS), use an additional option
$ mr3-run/tez/compile-tez.sh -P aws
To compile Hive for MR3, execute
hive/compile-hive.sh in the MR3 release with the following options:
--hivesrc3 # Choose hive3-mr3 (based on Hive 3.1.2) (default).
$ mr3-run/hive/compile-hive.sh --hivesrc3
The user can append as many Maven options as necessary to the command.
These scripts invoke Maven to compile the source code, and automatically update the local Maven repository
as well as
tez/tezjar directories in the MR3 release.
they also upload the new jar files to HDFS,
so the user does not need to execute
tez/upload-hdfslib-tez.sh in the MR3 release later.