Using a pre-built Docker image

To use a pre-built Docker image from DockerHub, it suffices to use the MR3 release containing all the executable scripts from the GitHub repository (https://github.com/mr3project/mr3-run-k8s). Clone the repository.

$ git clone https://github.com/mr3project/mr3-run-k8s.git

That’s all you need!

We recommend the quick start guide On Kubernetes which demonstrates how to use a pre-built Docker Image.

Building a new Docker image (Optional)

The user can build a Docker image for running Spark on MR3 on Kubernetes. We assume that the user can execute the command docker so as to build a Docker image.

The first step is to set up Spark on MR3.

The next step is to collect all necessary files in the directory kubernetes/spark by executing build-k8s-spark.sh which copies the script and jar files from the Spark installation (specified by SPARK_HOME in env.sh).

$ clean-k8s-spark.sh
$ build-k8s-spark.sh
$ ls kubernetes/spark/mr3/mr3lib/       # MR3 jar file
mr3-spark-assembly.jar
$ ls kubernetes/spark/spark/sparkmr3/   # Spark-MR3 jar file
spark-mr3-assembly.jar
$ ls kubernetes/spark/spark/bin/        # Spark scripts
...
$ ls kubernetes/spark/spark/jars/       # Spark jar files
...

Next the user should set two environment variables in kubernetes/spark/env.sh (not env.sh in the installation directory):

$ vi kubernetes/spark/env.sh

DOCKER_SPARK_IMG=10.1.90.9:5000/spark3:latest
SPARK_DOCKER_USER=spark
  • DOCKER_SPARK_IMG is the full name of the Docker image including a tag. It specifies the name of the Docker image for running Spark on MR3 which may include the address of a running Docker server.
  • DOCKER_USER should match the user specified in kubernetes/spark/Dockerfile (which is spark by default).

The last step is to build a Docker image from Dockerfile in the directory kubernetes/spark/ by executing kubernetes/build-spark.sh.

$ kubernetes/build-spark.sh