Skip to main content

Release 2.0 (2025-4-??)

MR3

Task scheduling

  • Delay scheduling has been removed, and the configuration key mr3.taskattempt.queue.scheme.use.delay is no longer used.
  • The default value of mr3.vertex.priority.scheme is postorder.
  • The default value of mr3.taskattempt.queue.scheme is indexed. Moreover TaskScheduler using indexed scheme implements an optimization that minimizes the impact of stragglers on downstream execution. It uses a new configuration key mr3.vertex.high.task.priority.fraction. See DAG/Task Scheduling for details.
  • The default value of tez.runtime.report.partition.stats in tez-site.xml is precise which uses kilobytes (instead of megabytes) for measuring the size of input data to Tasks. With this finer granularity, the new configuration key mr3.vertex.high.task.priority.fraction becomes more effective.
  • TaskScheduler provides a new mode strict which is particularly useful for Hive on MR3 with LLAP I/O enabled. See DAG/Task Scheduling for details.

Shuffling

  • By default, pipelined shuffling is disabled and ONE_TO_ONE edges can be created.
    • tez.runtime.pipelined-shuffle.enabled is set to false.
    • tez.runtime.enable.final-merge.in.output is set to true.
  • By default, memory-to-memory merging is enabled.
    • tez.runtime.shuffle.memory-to-memory.enable is set to true.
    • A new configuration key tez.runtime.optimize.local.fetch.ordered is set to false. It specifies whether fetching ordered data stored on local disks is directly read.

Backpressure handling and speculative fetching

MR3 implements backpressure handling and speculative fetching. It introduces the following configuration keys in tez-site.xml.

  • tez.runtime.shuffle.speculative.fetch.wait.millis
  • tez.runtime.shuffle.stuck.fetcher.threshold.millis
  • tez.runtime.shuffle.stuck.fetcher.release.millis

See Backpressure handling for details.

Miscellaneous

  • A new configuration key tez.shuffle.skip.verify.request specifies whether or not MR3 shuffle handlers skips checking the validity of shuffle requests. The default value is false, i.e., the validity of shuffle requests is checked.

Hive on MR3

  • LLAP I/O cache can be purged with the command llap cache -purge in Beeline.
  • LLAP I/O supports proactive purging.
  • Ranger 2.6.0 is supported.
  • Hive on MR3 uses ORC 2.0.3. For vectorized reading from Amazon S3, new configurations keys have been added to core-site.xml such as:
    • fs.s3a.vectored.read.min.seek.size
    • fs.s3a.vectored.read.max.merged.size