Release 1.3 (2021-8-18)
MR3
- Separate
mr3.k8s.keytab.secretandmr3.k8s.worker.secret. - Introduce
mr3.container.max.num.workersto limit the number of ContainerWorkers. - Introduce
mr3.k8s.pod.worker.node.affinity.specsto specify node affinity for ContainerWorker Pods. - No longer use
mr3.convert.container.address.host.name. - Support ContainerWorker recycling (which is different from ContainerWorker reuse) with
mr3.container.scheduler.scheme. - Introduce
mr3.am.task.no.retry.errorsto specify the names of errors that prevent the re-execution of Tasks (e.g.,OutOfMemoryError,MapJoinMemoryExhaustionError). - For reporting to MR3-UI, MR3 uses System.currentTimeMillis() instead of MonotonicClock.
- DAGAppMaster correctly reports to MR3Client the time from DAG submission to DAG execution.
- Introduce
mr3.container.localize.python.working.dir.unsafeto localize Python scripts in working directories of ContainerWorkers. Localizing Python scripts is an unsafe operation: 1) Python scripts are shared by all DAGs; 2) once localized, Python scripts are not deleted. - The image pull policy specified in
mr3.k8s.pod.image.pull.policyapplies to init containers as well as ContainerWorker containers. - Introduce
mr3.auto.scale.out.num.initial.containerswhich specifies the number of new ContainerWorkers to create in a scale-out operation when no ContainerWorkers are running. - Introduce
mr3.container.runtime.auto.start.inputto automatically start LogicalInputs in RuntimeTasks. - Speculative execution works on Vertexes with a single Task.
Hive on MR3
- Metastore correctly uses MR3 for compaction on Kubernetes.
- Auto parallelism is correctly enabled or disabled according to the result of compiling queries by overriding
tez.shuffle-vertex-manager.enable.auto-parallel, sotez.shuffle-vertex-manager.enable.auto-parallelcan be set to false. - Support the TRANSFORM clause with Python scripts (with
mr3.container.localize.python.working.dir.unsafeset to true inmr3-site.xml). - Introduce
hive.mr3.llap.orc.memory.per.thread.mbto specify the memory allocated to each ORC manager in low-level LLAP I/O threads.
Spark on MR3
- Initial release
Spark on MR3 is an add-on to Spark which replaces the scheduler backend of Spark with MR3. It exploits TaskScheduler of MR3 to recycle YARN containers or Kubernetes Pods among multiple Spark drivers, and delegates the scheduling of individual tasks to MR3. Hence it can be useful when multiple Spark drivers share a common cluster. As Spark on MR3 is just an add-on, it requires no change to Spark.