Hive on MR3 can use an external shuffle service, such as Hadoop/MapReduce shuffle service, in order to send and receive intermediate data between ContainerWorkers. Hive on MR3 can also use the shuffle handler available in the runtime system of MR3. On Kubernetes, Hive on MR3 should use the MR3 shuffle handler. See Using the MR3 Shuffle Handler for additional details.
Using the MR3 Shuffle Handler
In order to use the MR3 shuffle handler,
the user should set three configuration keys in hive-site.xml
and tez-site.xml
:
- Set
hive.mr3.use.daemon.shufflehandler
to a number larger than zero inhive-site.xml
. Then Hive on MR3 attaches as many DaemonTasks for MR3 shuffle handlers to ContainerGroups. - Set
tez.am.shuffle.auxiliary-service.id
totez_shuffle
(frommapreduce_shuffle
) intez-site.xml
. - Set
tez.shuffle.port
to a port number for the shuffle handler intez-site.xml
.
If hive.mr3.use.daemon.shufflehandler
is set to zero
but tez.am.shuffle.auxiliary-service.id
is set to tez_shuffle
,
ContainerWorkers fail with NullPointerException (from ShuffleUtils.deserializeShuffleProviderMetaData()
).
Currently Hive on MR3 can use the MR3 shuffle handler only with the all-in-one ContainerGroup scheme.