To install Spark on MR3, download an MR3 release for Spark and uncompress it. An MR3 release includes pre-built jar files of Spark-MR3 and MR3. We rename the new directory to mr3-run.

$ wget
$ gunzip -c spark-mr3-1.3-spark3.0.3.tar.gz | tar xvf -;
$ mv spark-mr3-1.3-spark3.0.3 mr3-run
$ cd mr3-run

The following structure shows important files and directories in the release:

# script for configuring Spark on MR3

# pre-compiled MR3
`-- mr3
    `-- mr3lib
        `-- mr3-spark-1.0-assembly.jar

# scripts for populating and cleaning directories for Docker images

# scripts and resources for running Spark on MR3 on Kubernetes
`-- kubernetes
    |-- spark
    |   |-- Dockerfile
    |   |--
    |   |-- conf
    |   |   |-- mr3-site.xml
    |   |   |-- spark-defaults.conf
    |   |   `--
    |   `-- spark
    |       |--
    |       |--
    |       |--
    |       `--
    `-- spark-yaml
        |-- cluster-role.yaml
        |-- driver-service.yaml
        |-- master-role.yaml
        |-- master-service-account.yaml
        |-- mr3-service.yaml
        |-- prometheus-service.yaml
        |-- spark-role.yaml
        |-- spark-service-account.yaml
        |-- spark-submit.yaml
        |-- workdir-pv.yaml
        |-- workdir-pvc.yaml
        |-- worker-role.yaml
        `-- worker-service-account.yaml

# configuration directories for Spark on MR3 on Hadoop
`-- conf
    |-- local
    |-- cluster
    `-- tpcds

# pre-compiled Spark-MR3 and scripts for running Spark on MR3 on Hadoop
`-- spark
    `-- sparkjar
        `-- sparkmr3
            `-- spark-mr3-3.0.3-assembly.jar

Setting environment variables for Spark on MR3 is a self-descriptive script located in the root directory of the installation. It contains major environment variables that should be set in every installation environment.

Running Spark on MR3 requires a Spark release (on both Kubernetes and Hadoop). The user can download a pre-built release of Spark from the Spark webpage. The following environment variables should be set in according to the configuration of the installation environment:

$ vi

export SPARK_HOME=~/spark
  • SPARK_JARS_DIR specifies the directory for containing Spark jar files in the Spark installation. The jar files in this directory are copied to the Docker image for Spark on MR3.
  • SPARK_HOME specifies the directory of the Spark installation. Spark on MR3 needs the scripts in the Spark installation (e.g., bin/spark-shell and bin/spark-submit).

Setting up for Spark on MR3 on Hadoop (for Hadoop only)

To run Spark on MR3 on Hadoop, the following environment variables should be set in

$ vi

export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}



  • HDFS_LIB_DIR specifies the directory on HDFS to which MR3 jar files are uploaded. Hence it is only for non-local mode.
  • HADOOP_HOME_LOCAL specifies the directory for the Hadoop installation to use in local mode in which everything runs on a single machine and does not require Yarn.
  • SECURE_MODE specifies whether the cluster is secure with Kerberos or not.
  • USER_PRINCIPAL and USER_KEYTAB specify the principal and keytab file for the user executing Spark.
  • MR3_TEZ_ENABLED and MR3_SPARK_ENABLED specify which internal runtime (Tez or Spark) to use in MR3.

Then the user should copy all the jar files (or MR3, Spark-MR3, and Spark) to HDFS.

$ mr3/
$ spark/

Building Spark-MR3 (Optional)

To build Spark-MR3 from the source code, the following environment variables should be set in

$ vi

  • SPARK_MR3_SRC specifies the directory containing the source code of Spark-MR3. The user can clone the GitHub repository ( to obtain the source code.
  • SPARK_MR3_REV specifies the version of Spark-MR3 (e.g., 3.0.3 for running Spark 3.0.3 on MR3).

Then execute spark/ in the MR3 release.

$ spark/

Building a Docker image (for Kubernetes only)

The user can build a Docker image for running Spark on MR3 on Kubernetes. (We assume that the user can execute the command docker so as to build a Docker image.) The first step is to collect all necessary files in the directory kubernetes/spark by executing which copies the script and jar files from the Spark installation (specified by SPARK_HOME in

$ ls kubernetes/spark/mr3/mr3lib/       # MR3 jar file
$ ls kubernetes/spark/spark/sparkmr3/   # Spark-MR3 jar file
$ ls kubernetes/spark/spark/bin/        # Spark scripts
$ ls kubernetes/spark/spark/jars/       # Spark jar files

Next the user should set two environment variables in kubernetes/spark/ (not in the installation directory):

$ vi kubernetes/spark/
  • DOCKER_HIVE_IMG is the full name of the Docker image including a tag. It specifies the name of the Docker image for running Spark on MR3 which may include the address of a running Docker server.
  • DOCKER_USER should match the user specified in kubernetes/hive/Dockerfile (which is root by default).

The last step is to build a Docker image from Dockerfile in the directory kubernetes/spark/ by executing kubernetes/

$ kubernetes/