Hive on MR3 can use an external shuffle service, such as Hadoop/MapReduce shuffle service, in order to send and receive intermediate data between ContainerWorkers. Hive on MR3 can also use the shuffle handler available in the runtime system of MR3. On Kubernetes, Hive on MR3 should use the MR3 shuffle handler. See Using the MR3 Shuffle Handler for additional details.

Using the MR3 Shuffle Handler

In order to use the MR3 shuffle handler, the user should set three configuration keys in hive-site.xml and tez-site.xml:

  • Set hive.mr3.use.daemon.shufflehandler to a number larger than zero in hive-site.xml. Then Hive on MR3 attaches as many DaemonTasks for MR3 shuffle handlers to ContainerGroups.
  • Set tez.am.shuffle.auxiliary-service.id to tez_shuffle (from mapreduce_shuffle) in tez-site.xml.
  • Set tez.shuffle.port to a port number for the shuffle handler in tez-site.xml.

If hive.mr3.use.daemon.shufflehandler is set to zero but tez.am.shuffle.auxiliary-service.id is set to tez_shuffle, ContainerWorkers fail with NullPointerException (from ShuffleUtils.deserializeShuffleProviderMetaData()). Currently Hive on MR3 can use the MR3 shuffle handler only with the all-in-one ContainerGroup scheme.