Comparison with Vanilla Spark |

Spark on MR3 does not modify vanilla Spark, so it behaves in the same way in most cases. Below we show major differences from vanilla Spark.

Dynamic resource allocation

Spark on MR3 ignores the configuration key spark.dynamicAllocation.enabled because MR3 decides the number of ContainerWorkers (i.e., Spark executors) for itself. By default, MR3 tries to obtain as much compute resources as possible from the resource manager (such as Yarn and Kubernetes). With autoscaling enabled, MR3 dynamically adjusts the number of ContainerWorkers based on the overall resource utilization. Hence the user may think of Spark on MR3 as running with spark.dynamicAllocation.enabled in effect set to true.

In order to set a limit on the number of ContainerWorkers, the user can use the configuration key mr3.container.max.num.workers of MR3 which specifies the maximum number of ContainerWorkers that can be created by a Spark application, as in:

# mr3-site.xml
<property>
  <name>mr3.container.max.num.workers</name>
  <value>10</value>
</property>

# spark-defaults.conf
spark.mr3.container.max.num.workers=10

# Spark shell
scala> sc.setLocalProperty("spark.mr3.container.max.num.workers", "10")

# Spark SQL
val sparkSession: org.apache.spark.sql.SparkSession = ...
sparkSession.sessionState.conf.setConfString("spark.mr3.container.max.num.workers", "10")

Speculative execution

Spark on MR3 relies on MR3 for speculative execution and ignores the configuration key spark.speculation of Spark.

Barrier execution mode

For executing Spark jobs in barrier mode, the user is responsible for creating a sufficient number of ContainerWorkers by submitting preliminary Spark jobs in advance. If a sufficient number of Spark executors are not found, Spark on MR3 immediately fails Spark jobs submitted in barrier mode.

Compute resources

Currently Spark on MR3 supports only CPU cores and memory for compute resources.