Using a pre-built Docker image
To use a pre-built Docker image from DockerHub, it suffices to use the MR3 release containing all the executable scripts from the GitHub repository (https://github.com/mr3project/mr3-run-k8s). Clone the repository.
$ git clone https://github.com/mr3project/mr3-run-k8s.git
That’s all you need!
We recommend the quick start guide On Kubernetes which demonstrates how to use a pre-built Docker Image.
Building a new Docker image (Optional)
The user can build a Docker image for running Spark on MR3 on Kubernetes.
We assume that the user can execute the command docker
so as to build a Docker image.
The first step is to set up Spark on MR3.
The next step is to collect all necessary files in the directory kubernetes/spark
by executing build-k8s-spark.sh
which copies the script and jar files from the Spark installation (specified by SPARK_HOME
in env.sh
).
$ clean-k8s-spark.sh
$ build-k8s-spark.sh
$ ls kubernetes/spark/mr3/mr3lib/ # MR3 jar file
mr3-spark-assembly.jar
$ ls kubernetes/spark/spark/sparkmr3/ # Spark-MR3 jar file
spark-mr3-assembly.jar
$ ls kubernetes/spark/spark/bin/ # Spark scripts
...
$ ls kubernetes/spark/spark/jars/ # Spark jar files
...
Next the user should set two environment variables in kubernetes/spark/env.sh
(not env.sh
in the installation directory):
$ vi kubernetes/spark/env.sh
DOCKER_SPARK_IMG=10.1.90.9:5000/spark3:latest
SPARK_DOCKER_USER=spark
DOCKER_SPARK_IMG
is the full name of the Docker image including a tag. It specifies the name of the Docker image for running Spark on MR3 which may include the address of a running Docker server.DOCKER_USER
should match the user specified inkubernetes/spark/Dockerfile
(which isspark
by default).
The last step is to build a Docker image from Dockerfile
in the directory kubernetes/spark/
by executing kubernetes/build-spark.sh
.
$ kubernetes/build-spark.sh