Release 2.0 (2025-4-21)
MR3
Task scheduling
- Delay scheduling has been removed,
and the configuration key
mr3.taskattempt.queue.scheme.use.delayis no longer used. - The default value of
mr3.vertex.priority.schemeispostorder. - The default value of
mr3.taskattempt.queue.schemeisindexed. Moreover TaskScheduler usingindexedscheme implements an optimization that minimizes the impact of stragglers on downstream execution. It uses a new configuration keymr3.vertex.high.task.priority.fraction. See DAG/Task Scheduling for details. - The default value of
tez.runtime.report.partition.statsintez-site.xmlisprecisewhich uses kilobytes (instead of megabytes) for measuring the size of input data to Tasks. With this finer granularity, the new configuration keymr3.vertex.high.task.priority.fractionbecomes more effective. - TaskScheduler provides a new mode
strictwhich is particularly useful for Hive on MR3 with LLAP I/O enabled. See DAG/Task Scheduling for details.
Shuffling
- By default, pipelined shuffling is disabled and
ONE_TO_ONEedges can be created.tez.runtime.pipelined-shuffle.enabledis set to false.tez.runtime.enable.final-merge.in.outputis set to true.
- By default, memory-to-memory merging is enabled.
tez.runtime.shuffle.memory-to-memory.enableis set to true.- A new configuration key
tez.runtime.optimize.local.fetch.orderedis set to false. It specifies whether fetching ordered data stored on local disks is directly read.
Backpressure and speculative fetching
MR3 implements backpressure and speculative fetching.
It introduces the following configuration keys in tez-site.xml.
tez.runtime.shuffle.speculative.fetch.wait.millistez.runtime.shuffle.stuck.fetcher.threshold.millistez.runtime.shuffle.stuck.fetcher.release.millis
See Backpressure for details.
Miscellaneous
- A new configuration key
tez.shuffle.skip.verify.requestspecifies whether or not MR3 shuffle handlers skips checking the validity of shuffle requests. The default value is false, i.e., the validity of shuffle requests is checked.
Hive on MR3
- LLAP I/O cache can be purged with the command
llap cache -purgein Beeline. - LLAP I/O supports proactive purging.
- Ranger 2.6.0 is supported.
- Hive on MR3 uses ORC 2.0.3.
For vectorized reading from Amazon S3,
new configurations keys have been added to
core-site.xmlsuch as:fs.s3a.vectored.read.min.seek.sizefs.s3a.vectored.read.max.merged.size