To install Spark on MR3 on Hadoop, first set up Spark on MR3. Then set the following environment variables in env.sh.

$ vi env.sh

export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
HDFS_LIB_DIR=/user/$USER/lib
HADOOP_HOME_LOCAL=$HADOOP_HOME
HADOOP_NATIVE_LIB=$HADOOP_HOME_LOCAL/lib/native

SECURE_MODE=false

USER_PRINCIPAL=spark@HADOOP
USER_KEYTAB=/home/spark/spark.keytab

MR3_TEZ_ENABLED=false
MR3_SPARK_ENABLED=true
  • HDFS_LIB_DIR specifies the directory on HDFS to which MR3 jar files are uploaded. Hence it is only for non-local mode.
  • HADOOP_HOME_LOCAL specifies the directory for the Hadoop installation to use in local mode in which everything runs on a single machine and does not require Yarn.
  • SECURE_MODE specifies whether the cluster is secure with Kerberos or not.
  • USER_PRINCIPAL and USER_KEYTAB specify the principal and keytab file for the user executing Spark.
  • MR3_TEZ_ENABLED and MR3_SPARK_ENABLED specify which internal runtime (Tez or Spark) to use in MR3.

Then the user should copy all the jar files (of MR3, Spark-MR3, and Spark) to HDFS.

$ mr3/upload-hdfslib-mr3.sh
$ spark/upload-hdfslib-spark.sh