Skip to main content

Release 1.3 (2021-8-18)

MR3

  • Separate mr3.k8s.keytab.secret and mr3.k8s.worker.secret.
  • Introduce mr3.container.max.num.workers to limit the number of ContainerWorkers.
  • Introduce mr3.k8s.pod.worker.node.affinity.specs to specify node affinity for ContainerWorker Pods.
  • No longer use mr3.convert.container.address.host.name.
  • Support ContainerWorker recycling (which is different from ContainerWorker reuse) with mr3.container.scheduler.scheme.
  • Introduce mr3.am.task.no.retry.errors to specify the names of errors that prevent the re-execution of Tasks (e.g., OutOfMemoryError,MapJoinMemoryExhaustionError).
  • For reporting to MR3-UI, MR3 uses System.currentTimeMillis() instead of MonotonicClock.
  • DAGAppMaster correctly reports to MR3Client the time from DAG submission to DAG execution.
  • Introduce mr3.container.localize.python.working.dir.unsafe to localize Python scripts in working directories of ContainerWorkers. Localizing Python scripts is an unsafe operation: 1) Python scripts are shared by all DAGs; 2) once localized, Python scripts are not deleted.
  • The image pull policy specified in mr3.k8s.pod.image.pull.policy applies to init containers as well as ContainerWorker containers.
  • Introduce mr3.auto.scale.out.num.initial.containers which specifies the number of new ContainerWorkers to create in a scale-out operation when no ContainerWorkers are running.
  • Introduce mr3.container.runtime.auto.start.input to automatically start LogicalInputs in RuntimeTasks.
  • Speculative execution works on Vertexes with a single Task.

Hive on MR3

  • Metastore correctly uses MR3 for compaction on Kubernetes.
  • Auto parallelism is correctly enabled or disabled according to the result of compiling queries by overriding tez.shuffle-vertex-manager.enable.auto-parallel, so tez.shuffle-vertex-manager.enable.auto-parallel can be set to false.
  • Support the TRANSFORM clause with Python scripts (with mr3.container.localize.python.working.dir.unsafe set to true in mr3-site.xml).
  • Introduce hive.mr3.llap.orc.memory.per.thread.mb to specify the memory allocated to each ORC manager in low-level LLAP I/O threads.

Spark on MR3

  • Initial release

Spark on MR3 is an add-on to Spark which replaces the scheduler backend of Spark with MR3. It exploits TaskScheduler of MR3 to recycle YARN containers or Kubernetes Pods among multiple Spark drivers, and delegates the scheduling of individual tasks to MR3. Hence it can be useful when multiple Spark drivers share a common cluster. As Spark on MR3 is just an add-on, it requires no change to Spark.