Skip to main content

Release 1.8 (2023-12-9)

MR3

  • Shuffle handlers can send multiple consecutive partitions at once.
  • Fix a bug in TaskScheduler which can get stuck when the number of ContainerWorkers is smaller than the value of mr3.am.task.max.failed.attempts.
  • Avoid unnecessary attempts to delete directories created by DAGs.
  • mr3.taskattempt.queue.scheme can be set to spark to use a Spark-style TaskScheduler which schedules consumer Tasks after all producer Tasks are finished.
  • mr3.dag.vertex.schedule.by.stage can be set to true to process Vertexes by stages similarly to Spark.
  • YarnResourceScheduler does not use AMRMClient.getAvailableResources() which returns incorrect values in some cases.
  • Restore TEZ_USE_MINIMAL in env.sh.
  • Support Celeborn as remote shuffle service.
  • mr3.dag.include.indeterminate.vertex specifies whether a DAG contains indeterminate Vertexes or not.
  • Fault tolerance in the event of disks failures works much faster.
  • Use Scala 2.12.
  • Support Java 17 (with USE_JAVA_17 in env.sh).

Hive on MR3

  • Fix ConcurrentModificationException generated during the construction of DAGs.
  • hive.mr3.application.name.prefix specifies the prefix of MR3 application names.
  • Fix a bug that ignores CTRL-C in Beeline and stop request from Hue.
  • hive.mr3.config.remove.keys specifies configuration keys to remove from JobConf to be passed to Tez.
  • hive.mr3.config.remove.prefixes specifies prefixes of configuration keys to remove from JobConf to be passed to Tez.

Support for Celeborn

MR3 can use Celeborn as remote shuffle service. Since intermediate data is stored on Celeborn worker nodes rather than on local disks. one of the key benefits of using Celeborn is that MR3 now needs only half as much space on local disks. The lower requirement on local disk storage is particularly important when running MR3 on public clouds.