Installing a pre-built MR3 release
Download an MR3 release for Spark and uncompress it.
An MR3 release includes pre-built jar files of Spark-MR3 and MR3.
We rename the new directory to
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.5/spark-mr3-1.5-spark3.2.2.tar.gz $ gunzip -c spark-mr3-1.5-spark3.2.2.tar.gz | tar xvf -; $ mv spark-mr3-1.5-spark3.2.2 mr3-run $ cd mr3-run
The following structure shows important files and directories in the release:
# script for configuring Spark on MR3 |-- env.sh # pre-compiled MR3 `-- mr3 `-- mr3lib `-- mr3-spark-1.0-assembly.jar # scripts for populating and cleaning directories for Docker images |-- build-k8s-spark.sh |-- clean-k8s-spark.sh # scripts and resources for running Spark on MR3 on Kubernetes `-- kubernetes |-- build-spark.sh |-- config-run.sh |-- run-spark-setup.sh |-- spark | |-- Dockerfile | |-- env.sh | |-- conf | | |-- core-site.xml | | |-- hive-site.xml | | |-- mr3-site.xml | | |-- spark-defaults.conf | | `-- spark-env.sh | `-- spark | |-- master-control.sh | |-- run-spark-shell.sh | |-- run-spark-submit.sh | |-- run-master.sh | `-- run-worker.sh `-- spark-yaml |-- cluster-role.yaml |-- driver-service.yaml |-- master-role.yaml |-- master-service-account.yaml |-- mr3-service.yaml |-- spark-role.yaml |-- spark-run.yaml |-- spark-service-account.yaml |-- workdir-pv.yaml |-- workdir-pvc.yaml |-- worker-role.yaml `-- worker-service-account.yaml # configuration directories for Spark on MR3 on Hadoop `-- conf |-- local |-- cluster `-- tpcds # pre-compiled Spark-MR3 and scripts for running Spark on MR3 on Hadoop `-- spark |-- compile-spark.sh |-- upload-hdfslib-spark.sh |-- run-spark-shell.sh |-- run-spark-submit.sh `-- sparkjar `-- sparkmr3 `-- spark-mr3-3.2.2-assembly.jar
Setting environment variables
env.sh is a self-descriptive script located in the root directory of the installation.
It contains major environment variables that should be set in every installation environment.
Running Spark on MR3 requires a Spark release (on both Kubernetes and Hadoop).
The user can download a pre-built release of Spark from the Spark webpage.
The following environment variables should be set in
env.sh according to the configuration of the installation environment:
$ vi env.sh SPARK_JARS_DIR=~/spark/jars export SPARK_HOME=~/spark
SPARK_JARS_DIRspecifies the directory for containing Spark jar files in the Spark installation. The jar files in this directory are copied to the Docker image for Spark on MR3.
SPARK_HOMEspecifies the directory of the Spark installation. Spark on MR3 needs the scripts in the Spark installation (e.g.,
Building Spark on MR3 (optional)
To build Spark-MR3 from the source code, the following environment variables should be set in
$ vi env.sh SPARK_MR3_SRC=~/spark-mr3 SPARK_MR3_REV=3.2.2
SPARK_MR3_SRCspecifies the directory containing the source code of Spark-MR3. The user can clone the GitHub repository (https://github.com/mr3project/spark-mr3.git) to obtain the source code.
SPARK_MR3_REVspecifies the version of Spark-MR3 (e.g., 3.2.2 for running Spark 3.2.2 on MR3).
spark/compile-spark.sh in the MR3 release.