Using the MR3 Shuffle Handler |

This page explains how to use the MR3 shuffle handler. For introduction, see MR3 Shuffle Handler.

Basic usage

In order to use the MR3 shuffle handler, the user should set three configuration keys in mr3-site.xml and tez-site.xml:

Set mr3.use.daemon.shufflehandler to the number of shuffle handlers in each ContainerWorker which should be larger than zero. Now a ContainerWorker runs its own threads for shuffle handlers. If it is set to zero, no shuffle handlers are created and MR3 uses an external shuffle service.
Set tez.am.shuffle.auxiliary-service.id to tez_shuffle (from mapreduce_shuffle) in tez-site.xml. Now the runtime system of MR3 routes intermediate data to the MR3 shuffle handler, not to an external shuffle service.
Set tez.shuffle.port to a port number for the shuffle handler in tez-site.xml. The default value is 13563.

If a ContainerWorker fails to secure a port number specified by tez.shuffle.port for a shuffle handler, it chooses a random port number instead. Thus multiple ContainerWorkers each with multiple shuffle handlers can run on the same node without conflicts.

The following configuration keys in tez-site.xml control the behavior of shuffle handlers. In particular, the user may have to adjust the value for tez.shuffle.max.threads in order to limit the total number of threads for shuffle handlers. For example, on a node with 40 cores, setting tez.shuffle.max.threads to 0 creates 2 * 40 = 80 threads for each shuffle handler. If mr3.use.daemon.shufflehandler is set to 20, a ContainerWorker creates a total of 80 * 20 = 1600 threads for shuffle handlers, which may negatively affect the performance. Thus the user can set tez.shuffle.max.threads to:

the total number of threads for shuffle handlers / value for mr3.use.daemon.shufflehandler

Name	Default value	Description
tez.shuffle.connection-keep-alive.enable	false	true: keep connections alive for reuse. false: do not reuse
tez.shuffle.max.threads	0	Number of threads for each shuffle handler. 0 sets the number of threads to 2 * the number of cores.
tez.shuffle.listen.queue.size	128	Size of the listening queue. Can be set to the value in `/proc/sys/net/core/somaxconn`.
tez.shuffle.mapoutput-info.meta.cache.size	1000	Size of meta data of the output of mappers

Secure shuffle (using SSL mode)

The shuffle handler of MR3 supports secure shuffle using SSL (Secure Sockets Layer) mode. In comparison with Hadoop/MapReduce shuffle service, enabling secure shuffle in MR3 is much simpler because the incorporation of TEZ-4096 allows MR3 to include all SSL-related configurations in mr3-site.xml and tez-site.xml. That is, the user does not need separate configuration files such as ssl-server-mr3.xml and ssl-client-mr3.xml.

Enabling secure shuffle takes three steps:

create JKS files: a KeyStore file and a TrustStore file
update mr3-site.xml
update tez-site.xml

Step 1. Create JKS files

Before creating JKS files, the user should choose CN (Common Name) for the nodes in the cluster. On Hadoop, the user can choose CN according to the domain (e.g., * or *.datamonad.com). On Kubernetes, however, it must be *.service-worker.hivemr3.svc.cluster.local because all ContainerWorker Pods belong to a headless Service service-worker in the namespace hivemr3.

After choosing CN, the user should create JKS files. Below we illustrate the creation of a KeyStore file mr3-keystore.jks and a TrustStore file mr3-truststore.jks with passwords key_password, keystore_password, and truststore_password.

# create a KeyStore
$ keytool -genkey -alias mr3-shuffle -keyalg RSA -keysize 2048 -dname "CN=*" -keypass key_password -keystore mr3-keystore.jks -storepass keystore_password -validity 3650

# extract the CSR (Certificate Signing Request) from the KeyStore
$ keytool -keystore mr3-keystore.jks -storepass keystore_password -alias mr3-shuffle -certreq -file mr3-shuffle.csr

# create a private key
$ openssl genrsa -out mr3.key 2048

# generate a CA certificate from the private key
$ openssl req -new -x509 -key mr3.key -out mr3.crt

# sign the certificate with the CA certificate
$ openssl x509 -req -in mr3-shuffle.csr -CA mr3.crt -CAkey mr3.key -CAcreateserial -out mr3-shuffle.crt

# import the certificate into the KeyStore
$ keytool -import -alias mr3-shuffle -file mr3-shuffle.crt -keystore mr3-shuffle.jks -storepass keystore_password

# create a TrustStore
$ keytool -importcert -alias mr3-shuffle -file mr3-shuffle.crt -keystore mr3-truststore.jks -storepass truststore_password

#  check all the files
$ ls mr3*
mr3.crt  mr3.key  mr3-keystore.jks  mr3-shuffle.crt  mr3-shuffle.csr  mr3-shuffle.jks  mr3.srl  mr3-truststore.jks

On Hadoop, copy mr3-keystore.jks and mr3-truststore.jks to a directory on HDFS (e.g., /user/hive/lib/). On Kubernetes, copy mr3-keystore.jks and mr3-truststore.jks to the directory kubernetes/key/ in the MR3 release. Change the permission if necessary.

Step 2. Update `mr3-site.xml`

On Hadoop, extend the configuration key mr3.aux.uris in mr3-site.xml to include the path on HDFS where mr3-keystore.jks and mr3-truststore.jks reside.

<property>
  <name>mr3.aux.uris</name>
  <value>${auxuris},/user/hive/lib/mr3-keystore.jks,/user/hive/lib/mr3-truststore.jks</value>
</property>

On Kubernetes, set CREATE_KEYTAB_SECRET and CREATE_WORKER_SECRET to true in kubernetes/env.sh.

CREATE_KEYTAB_SECRET=true
CREATE_WORKER_SECRET=true

Step 3. Update `tez-site.xml`

For updating tez-site.xml, the user should consider whether the configuration key hadoop.security.credential.provider.path in core-site.xml is set to a JKS file or not. If it is set, all passwords are retrieved from the JKS file, so the user needs to set only the following configuration keys in tez-site.xml.

ssl.server.keystore.location to mr3-keystore.jks on Hadoop and /opt/mr3-run/key/mr3-keystore.jks on Kubernetes
ssl.server.truststore.location to mr3-truststore.jks on Hadoop and /opt/mr3-run/key/mr3-truststore.jks on Kubernetes
ssl.client.truststore.location to mr3-truststore.jks on Hadoop and /opt/mr3-run/key/mr3-truststore.jks on Kubernetes.

If it is not set, all passwords should be provided in text, so the user needs to set the following configuration keys as well.

ssl.server.keystore.password to the KeyStore password
ssl.server.truststore.password to the TrustStore password
ssl.client.truststore.password to the TrusStore password

Finally the user should set the following configuration keys to enable secure shuffle.

<property>
  <name>tez.runtime.shuffle.ssl.enable</name>
  <value>true</value>
</property>

<property>
  <name>tez.runtime.shuffle.keep-alive.enabled</name>
  <value>true</value>
</property>

On old versions of Kubernetes, the shuffle handler may occasionally cause fetch-failures after generating SSLException. In such a case, the user may observe VertexReruns due to fetch-failures.

Running shuffle handlers in a separate process on Kubernetes

In order to run shuffle handlers in a separate process inside a ContainerWorker Pod, the user should set three configuration keys in mr3-site.xml.

Set mr3.use.daemon.shufflehandler to zero. Then the process for ContainerWorker itself does not create threads for shuffle handlers, and a separate process for shuffle handlers is created.
Set mr3.k8s.shuffle.process.ports to a comma-separated list of port numbers. Then MR3 creates a shuffle handler for each port number in the separate process for shuffle handlers. For example, if it is set to 15500,15510,15520,15530,15540, MR3 creates 5 shuffle handlers with port number 15500 to 15540.
Set mr3.k8s.shufflehandler.process.memory.mb to the size of memory in MB for the process for shuffle handlers. The default value is 1024.

The port numbers specified by the configuration key mr3.k8s.shuffle.process.ports should not conflict with those already in use inside a ContainerWorker Pod. Usually port numbers in the range of 10000 and above are safe to use. Since every ContainerWorker Pod is assigned a unique IP address, no conflict arises between different ContainerWorker Pods, whether they run on the same physical node or not.