Spark on MR3 does not modify vanilla Spark, so it behaves in the same way in most cases. Below we show major differences from vanilla Spark.
Dynamic resource allocation
Spark on MR3 ignores the configuration key
because MR3 decides the number of ContainerWorkers (i.e., Spark executors) for itself.
By default, MR3 tries to obtain as much compute resources as possible from the resource manager (such as Yarn and Kubernetes).
With autoscaling enabled,
MR3 dynamically adjusts the number of ContainerWorkers based on the overall resource utilization.
Hence the user may think of Spark on MR3 as running with
spark.dynamicAllocation.enabled in effect set to true.
In order to set a limit on the number of ContainerWorkers,
the user can use the configuration key
mr3.container.max.num.workers of MR3
which specifies the maximum number of ContainerWorkers that can be created by a Spark application, as in:
# mr3-site.xml <property> <name>mr3.container.max.num.workers</name> <value>10</value> </property> # spark-defaults.conf spark.mr3.container.max.num.workers=10 # Spark shell scala> sc.setLocalProperty("spark.mr3.container.max.num.workers", "10") # Spark SQL val sparkSession: org.apache.spark.sql.SparkSession = ... sparkSession.sessionState.conf.setConfString("spark.mr3.container.max.num.workers", "10")
Spark on MR3 relies on MR3 for speculative execution
and ignores the configuration key
spark.speculation of Spark.
Barrier execution mode
For executing Spark jobs in barrier mode, the user is responsible for creating a sufficient number of ContainerWorkers by submitting preliminary Spark jobs in advance. If a sufficient number of Spark executors are not found, Spark on MR3 immediately fails Spark jobs submitted in barrier mode.
Currently Spark on MR3 supports only CPU cores and memory for compute resources.