For asking any questions, please visit MR3 Google Group or join MR3 Slack.

To install Spark on MR3 on Hadoop, first set up Spark on MR3. Then set the following environment variables in env.sh.

$ vi env.sh

export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
HDFS_LIB_DIR=/user/$USER/lib
HADOOP_HOME_LOCAL=$HADOOP_HOME
HADOOP_NATIVE_LIB=$HADOOP_HOME_LOCAL/lib/native

SECURE_MODE=false

USER_PRINCIPAL=spark@HADOOP
USER_KEYTAB=/home/spark/spark.keytab

MR3_TEZ_ENABLED=false
MR3_SPARK_ENABLED=true
  • HDFS_LIB_DIR specifies the directory on HDFS to which MR3 jar files are uploaded. Hence it is only for non-local mode.
  • HADOOP_HOME_LOCAL specifies the directory for the Hadoop installation to use in local mode in which everything runs on a single machine and does not require Yarn.
  • SECURE_MODE specifies whether the cluster is secure with Kerberos or not.
  • USER_PRINCIPAL and USER_KEYTAB specify the principal and keytab file for the user executing Spark.
  • MR3_TEZ_ENABLED and MR3_SPARK_ENABLED specify which internal runtime (Tez or Spark) to use in MR3.

Then the user should copy all the jar files (of MR3, Spark-MR3, and Spark) to HDFS.

$ mr3/upload-hdfslib-mr3.sh
$ spark/upload-hdfslib-spark.sh