Release 1.8 (2023-12-9)
MR3
- Shuffle handlers can send multiple consecutive partitions at once.
- Fix a bug in TaskScheduler which can get stuck when the number of ContainerWorkers is smaller than the value of
mr3.am.task.max.failed.attempts
. - Avoid unnecessary attempts to delete directories created by DAGs.
mr3.taskattempt.queue.scheme
can be set tospark
to use a Spark-style TaskScheduler which schedules consumer Tasks after all producer Tasks are finished.mr3.dag.vertex.schedule.by.stage
can be set to true to process Vertexes by stages similarly to Spark.- YarnResourceScheduler does not use AMRMClient.getAvailableResources() which returns incorrect values in some cases.
- Restore
TEZ_USE_MINIMAL
inenv.sh
. - Support Celeborn as remote shuffle service.
mr3.dag.include.indeterminate.vertex
specifies whether a DAG contains indeterminate Vertexes or not.- Fault tolerance in the event of disks failures works much faster.
- Use Scala 2.12.
- Support Java 17 (with
USE_JAVA_17
inenv.sh
).
Hive on MR3
- Fix ConcurrentModificationException generated during the construction of DAGs.
hive.mr3.application.name.prefix
specifies the prefix of MR3 application names.- Fix a bug that ignores CTRL-C in Beeline and stop request from Hue.
hive.mr3.config.remove.keys
specifies configuration keys to remove from JobConf to be passed to Tez.hive.mr3.config.remove.prefixes
specifies prefixes of configuration keys to remove from JobConf to be passed to Tez.
Support for Celeborn
MR3 can use Celeborn as remote shuffle service. Since intermediate data is stored on Celeborn worker nodes rather than on local disks. one of the key benefits of using Celeborn is that MR3 now needs only half as much space on local disks. The lower requirement on local disk storage is particularly important when running MR3 on public clouds.