Release 1.3 (2021-8-18)
MR3
- Separate
mr3.k8s.keytab.secret
andmr3.k8s.worker.secret
. - Introduce
mr3.container.max.num.workers
to limit the number of ContainerWorkers. - Introduce
mr3.k8s.pod.worker.node.affinity.specs
to specify node affinity for ContainerWorker Pods. - No longer use
mr3.convert.container.address.host.name
. - Support ContainerWorker recycling (which is different from ContainerWorker reuse) with
mr3.container.scheduler.scheme
. - Introduce
mr3.am.task.no.retry.errors
to specify the names of errors that prevent the re-execution of Tasks (e.g.,OutOfMemoryError,MapJoinMemoryExhaustionError
). - For reporting to MR3-UI, MR3 uses System.currentTimeMillis() instead of MonotonicClock.
- DAGAppMaster correctly reports to MR3Client the time from DAG submission to DAG execution.
- Introduce
mr3.container.localize.python.working.dir.unsafe
to localize Python scripts in working directories of ContainerWorkers. Localizing Python scripts is an unsafe operation: 1) Python scripts are shared by all DAGs; 2) once localized, Python scripts are not deleted. - The image pull policy specified in
mr3.k8s.pod.image.pull.policy
applies to init containers as well as ContainerWorker containers. - Introduce
mr3.auto.scale.out.num.initial.containers
which specifies the number of new ContainerWorkers to create in a scale-out operation when no ContainerWorkers are running. - Introduce
mr3.container.runtime.auto.start.input
to automatically start LogicalInputs in RuntimeTasks. - Speculative execution works on Vertexes with a single Task.
Hive on MR3
- Metastore correctly uses MR3 for compaction on Kubernetes.
- Auto parallelism is correctly enabled or disabled according to the result of compiling queries by overriding
tez.shuffle-vertex-manager.enable.auto-parallel
, sotez.shuffle-vertex-manager.enable.auto-parallel
can be set to false. - Support the TRANSFORM clause with Python scripts (with
mr3.container.localize.python.working.dir.unsafe
set to true inmr3-site.xml
). - Introduce
hive.mr3.llap.orc.memory.per.thread.mb
to specify the memory allocated to each ORC manager in low-level LLAP I/O threads.
Spark on MR3
- Initial release
Spark on MR3 is an add-on to Spark which replaces the scheduler backend of Spark with MR3. It exploits TaskScheduler of MR3 to recycle YARN containers or Kubernetes Pods among multiple Spark drivers, and delegates the scheduling of individual tasks to MR3. Hence it can be useful when multiple Spark drivers share a common cluster. As Spark on MR3 is just an add-on, it requires no change to Spark.