Running Spark on MR3 on Hadoop |

On Hadoop, Spark on MR3 can run in client mode in which a Spark driver is executed not in a Yarn container but in an ordinary process on the same node where Spark on MR3 is launched. It does not support cluster mode in which a Spark driver is executed in a Yarn container.

Running a Spark driver

To run a Spark driver (in client mode), set environment variables SPARK_DRIVER_CORES and SPARK_DRIVER_MEMORY_MB to specify the arguments for the configuration keys spark.driver.cores and spark.driver.memory (in MB), respectively, e.g.:

$ export SPARK_DRIVER_CORES=2
$ export SPARK_DRIVER_MEMORY_MB=13107

Then the user can execute spark/run-spark-submit.sh and spark/run-spark-shell.sh, which in turn executes the script bin/spark-submit and bin/spark-shell, respectively, under the directory of the Spark installation. The scripts accept the following options.

--local: Spark on MR3 reads configuration files under the directory conf/local.
--cluster (default): Spark on MR3 reads configuration files under the directory conf/cluster.
--tpcds: Spark on MR3 reads configuration files under the directory conf/tpcds.
--amprocess: DAGAppMaster runs in LocalProcess mode instead of Yarn mode. See DAGAppMaster and ContainerWorker Modes for more details on LocalProcess mode.
--driver-java-options, --jars, --driver-class-path, --conf: These options with arguments are passed on to bin/spark-submit and bin/spark-shell.

The user should use a --conf spark.driver.host option to specify the host name or address where the Spark driver runs. In order to run multiple Spark drivers on the same node, the user should also specify a unique port for each individual driver with a --conf spark.driver.port option.

In order to connect to an existing DAGAppMaster running in Yarn mode (instead of creating a new one), the user should set the configuration key spark.mr3.appid to its application ID using a --conf option, e.g.:

$ spark/run-spark-shell.sh --conf spark.driver.host=gold0 --conf spark.mr3.appid=application_1620113436187_0291

By default, terminating a Spark driver does not delete the running DAGAppMaster. In order to kill DAGAppMaster automatically when a Spark driver terminates, the user can set the configuration key spark.mr3.keep.am to false using the --conf option. Note that setting spark.mr3.keep.am to false is effective even when an existing DAGAppMaster is used.

$ spark/run-spark-shell.sh --conf spark.driver.host=gold0 --conf spark.mr3.keep.am=false
$ spark/run-spark-shell.sh --conf spark.driver.host=gold0 --conf spark.mr3.keep.am=false --conf spark.mr3.appid=application_1620113436187_0291

Stopping Spark on MR3

If the configuration key spark.mr3.keep.am is set to the default value of true, the user can manually kill DAGAppMaster by executing the command yarn.