On Hadoop, Spark on MR3 can run in client mode in which a Spark driver is executed not in a Yarn container but in an ordinary process on the same node where Spark on MR3 is launched. It does not support cluster mode in which a Spark driver is executed in a Yarn container.
Running a Spark driver
To run a Spark driver (in client mode),
set environment variables SPARK_DRIVER_CORES
and SPARK_DRIVER_MEMORY_MB
to specify the arguments for the configuration keys spark.driver.cores
and spark.driver.memory
(in MB), respectively, e.g.:
$ export SPARK_DRIVER_CORES=2
$ export SPARK_DRIVER_MEMORY_MB=13107
Then the user can execute spark/run-spark-submit.sh
and spark/run-spark-shell.sh
,
which in turn executes the script bin/spark-submit
and bin/spark-shell
, respectively, under the directory of the Spark installation.
The scripts accept the following options.
--local
: Spark on MR3 reads configuration files under the directoryconf/local
.--cluster
(default): Spark on MR3 reads configuration files under the directoryconf/cluster
.--tpcds
: Spark on MR3 reads configuration files under the directoryconf/tpcds.
--amprocess
: DAGAppMaster runs in LocalProcess mode instead of Yarn mode. See DAGAppMaster and ContainerWorker Modes for more details on LocalProcess mode.--driver-java-options
,--jars
,--driver-class-path
,--conf
: These options with arguments are passed on tobin/spark-submit
andbin/spark-shell
.
The user should use a --conf spark.driver.host
option to specify
the host name or address where the Spark driver runs.
In order to run multiple Spark drivers on the same node,
the user should also specify a unique port for each individual driver
with a --conf spark.driver.port
option.
In order to connect to an existing DAGAppMaster running in Yarn mode (instead of creating a new one),
the user should set the configuration key spark.mr3.appid
to its application ID using a --conf
option, e.g.:
$ spark/run-spark-shell.sh --conf spark.driver.host=gold0 --conf spark.mr3.appid=application_1620113436187_0291
By default, terminating a Spark driver does not delete the running DAGAppMaster.
In order to kill DAGAppMaster automatically when a Spark driver terminates,
the user can set the configuration key spark.mr3.keep.am
to false using the --conf
option.
Note that setting spark.mr3.keep.am
to false is effective even when an existing DAGAppMaster is used.
$ spark/run-spark-shell.sh --conf spark.driver.host=gold0 --conf spark.mr3.keep.am=false
$ spark/run-spark-shell.sh --conf spark.driver.host=gold0 --conf spark.mr3.keep.am=false --conf spark.mr3.appid=application_1620113436187_0291
Stopping Spark on MR3
If the configuration key spark.mr3.keep.am
is set to the default value of true,
the user can manually kill DAGAppMaster by executing the command yarn
.