Release 2.0 (2025-4-??)
MR3
Task scheduling
- Delay scheduling has been removed,
and the configuration key
mr3.taskattempt.queue.scheme.use.delay
is no longer used. - The default value of
mr3.vertex.priority.scheme
ispostorder
. - The default value of
mr3.taskattempt.queue.scheme
isindexed
. Moreover TaskScheduler usingindexed
scheme implements an optimization that minimizes the impact of stragglers on downstream execution. It uses a new configuration keymr3.vertex.high.task.priority.fraction
. See DAG/Task Scheduling for details. - The default value of
tez.runtime.report.partition.stats
intez-site.xml
isprecise
which uses kilobytes (instead of megabytes) for measuring the size of input data to Tasks. With this finer granularity, the new configuration keymr3.vertex.high.task.priority.fraction
becomes more effective. - TaskScheduler provides a new mode
strict
which is particularly useful for Hive on MR3 with LLAP I/O enabled. See DAG/Task Scheduling for details.
Shuffling
- By default, pipelined shuffling is disabled and
ONE_TO_ONE
edges can be created.tez.runtime.pipelined-shuffle.enabled
is set to false.tez.runtime.enable.final-merge.in.output
is set to true.
- By default, memory-to-memory merging is enabled.
tez.runtime.shuffle.memory-to-memory.enable
is set to true.- A new configuration key
tez.runtime.optimize.local.fetch.ordered
is set to false. It specifies whether fetching ordered data stored on local disks is directly read.
Backpressure handling and speculative fetching
MR3 implements backpressure handling and speculative fetching.
It introduces the following configuration keys in tez-site.xml
.
tez.runtime.shuffle.speculative.fetch.wait.millis
tez.runtime.shuffle.stuck.fetcher.threshold.millis
tez.runtime.shuffle.stuck.fetcher.release.millis
See Backpressure handling for details.
Miscellaneous
- A new configuration key
tez.shuffle.skip.verify.request
specifies whether or not MR3 shuffle handlers skips checking the validity of shuffle requests. The default value is false, i.e., the validity of shuffle requests is checked.
Hive on MR3
- LLAP I/O cache can be purged with the command
llap cache -purge
in Beeline. - LLAP I/O supports proactive purging.
- Ranger 2.6.0 is supported.
- Hive on MR3 uses ORC 2.0.3.
For vectorized reading from Amazon S3,
new configurations keys have been added to
core-site.xml
such as:fs.s3a.vectored.read.min.seek.size
fs.s3a.vectored.read.max.merged.size