Secure Shuffle with SSL
This page explains how to use the MR3 shuffle handler with SSL enabled. For introduction, see MR3 Shuffle Handler.
In comparison with Hadoop/MapReduce shuffle service,
enabling secure shuffle in MR3 is much simpler
because the incorporation of TEZ-4096
allows MR3 to include all SSL-related configurations in mr3-site.xml
and tez-site.xml
.
That is, the user does not need separate configuration files such as ssl-server-mr3.xml
and ssl-client-mr3.xml
.
Enabling secure shuffle takes three steps:
- create JKS files: a KeyStore file and a TrustStore file
- update
mr3-site.xml
- update
tez-site.xml
Step 1. Create JKS files
Before creating JKS files,
the user should choose CN (Common Name) for the nodes in the cluster.
On Hadoop, the user can choose CN according to the domain (e.g., *
or *.foo.com
).
On Kubernetes, however, it must be *.service-worker.hivemr3.svc.cluster.local
because all ContainerWorker Pods belong to a headless Service service-worker
in the namespace hivemr3
.
After choosing CN, the user should create JKS files.
Below we illustrate the creation of a KeyStore file mr3-keystore.jks
and a TrustStore file mr3-truststore.jks
with passwords key_password
, keystore_password
, and truststore_password
.
# create a KeyStore
keytool -genkey -alias mr3-shuffle -keyalg RSA -keysize 2048 -dname "CN=*" -keypass key_password -keystore mr3-keystore.jks -storepass keystore_password -validity 3650
# extract the CSR (Certificate Signing Request) from the KeyStore
keytool -keystore mr3-keystore.jks -storepass keystore_password -alias mr3-shuffle -certreq -file mr3-shuffle.csr
# create a private key
openssl genrsa -out mr3.key 2048
# generate a CA certificate from the private key
openssl req -new -x509 -key mr3.key -out mr3.crt
# sign the certificate with the CA certificate
openssl x509 -req -in mr3-shuffle.csr -CA mr3.crt -CAkey mr3.key -CAcreateserial -out mr3-shuffle.crt
# import the certificate into the KeyStore
keytool -import -alias mr3-shuffle -file mr3-shuffle.crt -keystore mr3-shuffle.jks -storepass keystore_password
# create a TrustStore
keytool -importcert -alias mr3-shuffle -file mr3-shuffle.crt -keystore mr3-truststore.jks -storepass truststore_password
# check all the files
ls mr3*
mr3.crt mr3.key mr3-keystore.jks mr3-shuffle.crt mr3-shuffle.csr mr3-shuffle.jks mr3.srl mr3-truststore.jks
On Hadoop, copy mr3-keystore.jks
and mr3-truststore.jks
to a directory on HDFS (e.g., /user/hive/lib/
).
On Kubernetes, copy mr3-keystore.jks
and mr3-truststore.jks
to the directory kubernetes/key
in the MR3 release.
Change the permission if necessary.
Step 2. Update mr3-site.xml
On Hadoop, extend the configuration key mr3.aux.uris
in mr3-site.xml
to include the path on HDFS where mr3-keystore.jks
and mr3-truststore.jks
reside.
vi hadoop/conf/tpcds/mr3-site.xml
<property>
<name>mr3.aux.uris</name>
<value>${auxuris},/user/hive/lib/mr3-keystore.jks,/user/hive/lib/mr3-truststore.jks</value>
</property>
On Kubernetes, set CREATE_KEYTAB_SECRET
and CREATE_WORKER_SECRET
to true in kubernetes/env.sh
.
vi kubernetes/env.sh
CREATE_KEYTAB_SECRET=true
CREATE_WORKER_SECRET=true
Step 3. Update tez-site.xml
For updating tez-site.xml
, the user should consider
whether the configuration key hadoop.security.credential.provider.path
in core-site.xml
is set to a JKS file or not.
If it is set, all passwords are retrieved from the JKS file,
so the user needs to set only the following configuration keys in tez-site.xml
.
ssl.server.keystore.location
tomr3-keystore.jks
on Hadoop and/opt/mr3-run/key/mr3-keystore.jks
on Kubernetesssl.server.truststore.location
tomr3-truststore.jks
on Hadoop and/opt/mr3-run/key/mr3-truststore.jks
on Kubernetesssl.client.truststore.location
tomr3-truststore.jks
on Hadoop and/opt/mr3-run/key/mr3-truststore.jks
on Kubernetes.
If it is not set, all passwords should be provided in text, so the user needs to set the following configuration keys as well.
ssl.server.keystore.password
to the KeyStore passwordssl.server.truststore.password
to the TrustStore passwordssl.client.truststore.password
to the TrustStore password
Finally the user should set the following configuration keys to enable secure shuffle.
vi hadoop/conf/tpcds/tez-site.xml
vi kubernetes/conf/tez-site.xml
<property>
<name>tez.runtime.shuffle.ssl.enable</name>
<value>true</value>
</property>
<property>
<name>tez.runtime.shuffle.keep-alive.enabled</name>
<value>true</value>
</property>