By default, Hive on MR3 uses an external shuffle service, such as Hadoop/MapReduce shuffle service, in order to send and receive intermediate data between ContainerWorkers. Hive on MR3 can also use the shuffle handler available in the runtime system of MR3.

In order to use the MR3 shuffle handler, the user should set three configuration keys in hive-site.xml and tez-site.xml:

  • Set hive.mr3.use.daemon.shufflehandler to a number larger than zero in hive-site.xml. Then Hive on MR3 attaches as many DaemonTasks for MR3 shuffle handlers to ContainerGroups.
  • Set tez.am.shuffle.auxiliary-service.id to tez_shuffle (from mapreduce_shuffle) in tez-site.xml.
  • Set tez.shuffle.port to a port number for the shuffle handler in tez-site.xml.

If hive.mr3.use.daemon.shufflehandler is set to zero but tez.am.shuffle.auxiliary-service.id is set to tez_shuffle, ContainerWorkers fail with NullPointerException (from ShuffleUtils.deserializeShuffleProviderMetaData()). Currently Hive on MR3 can use the MR3 shuffle handler only with the all-in-one ContainerGroup scheme.