Skip to main content

Release 2.1 (2025-7-2)

MR3

Task scheduling

  • The use of speculative fetchers eliminates fetch delays in most cases, and thus speculative execution is disabled in hive-site.xml (with hive.mr3.am.task.concurrent.run.threshold.percent set to 100). Speculative execution is recommended only when stragglers are created unpredictably.

Shuffling

  • Pipelined shuffling works correctly, even with fault tolerance and speculative execution. Using speculative execution with pipelined shuffling, however, is not recommended.
  • Pipelined shuffling is enabled in tez-site.xml (with tez.runtime.pipelined-shuffle.enabled set to true).
  • When using MR3 shuffle handlers, spill records (meta-data describing shuffle payload) are kept in memory and not written to local disk. As a result, the configuration keys tez.shuffle.indexcache.share and tez.shuffle.indexcache.mb are no longer used.
  • Introduce a new configuration key tez.runtime.use.free.memory.writer.output to use free memory for storing the output of tasks. If set to true, we strongly recommend setting hive.mr3.delete.vertex.local.directory (which is mapped to mr3.am.notify.destination.vertex.complete) to true in hive-site.xml so that intermediate data produced by a Vertex is deleted from memory as soon as all its consumer Vertexes are completed.
  • The keep-alive setting for shuffling is disabled in tez-site.xml.
  • On Kubernetes, running shuffle handlers in a separate process inside a ContainerWorker Pod is no longer supported. As a result, the configuration keys mr3.k8s.shuffle.process.ports and mr3.k8s.shufflehandler.process.memory.mb are no longer used.

Backpressure handling and speculative fetching

  • Speculative fetchers are created more often in order to avoid reporting fetch failures (which trigger Vertex reruns) as much as possible. For example, the configuration key tez.runtime.shuffle.connect.timeout is set to 27500 in tez-site.xml, and connection attempts are made for 30 seconds before reporting fetch failures.
  • The values for the following configuration keys are adjusted in tez-site.xml.
    • tez.runtime.shuffle.speculative.fetch.wait.millis to 12500
    • tez.runtime.shuffle.stuck.fetcher.threshold.millis to 2500
    • tez.runtime.shuffle.stuck.fetcher.release.millis to 7500
  • Introduce a new configuration key tez.runtime.shuffle.max.speculative.fetch.attempts to specify the maximum number of speculative fetchers for each fetch attempt. The default value is 2 (which means that up to 3 fetchers can be made for each fetch attempt).

Miscellaneous

  • For ContainerWorkers, -Xms is set to the same value as -Xmx in the Java options. This effectively reverts HIVE-20951 in Hive on MR3.
  • A new configuration key mr3.am.rpc.protection specifies the value for hadoop.rpc.protection of Hadoop. Set it to privacy to encrypt messages to/from DAGAppMaster. By default, it is set to authentication and messages are not encrypted.

Hive on MR3

  • Replace the Murmur3 hash function with xxHash32.