Secure Shuffle with SSL
This page explains how to use the MR3 shuffle handler with SSL enabled. For introduction, see MR3 Shuffle Handler.
In comparison with Hadoop/MapReduce shuffle service,
enabling secure shuffle in MR3 is much simpler
because the incorporation of TEZ-4096
allows MR3 to include all SSL-related configurations in mr3-site.xml and tez-site.xml.
That is, the user does not need separate configuration files such as ssl-server-mr3.xml and ssl-client-mr3.xml.
Enabling secure shuffle takes three steps:
- create JKS files: a KeyStore file and a TrustStore file
- update
mr3-site.xml - update
tez-site.xml
Step 1. Create JKS files
Before creating JKS files,
the user should choose CN (Common Name) for the nodes in the cluster.
On Hadoop, the user can choose CN according to the domain (e.g., * or *.foo.com).
On Kubernetes, however, it must be *.service-worker.hivemr3.svc.cluster.local
because all ContainerWorker Pods belong to a headless Service service-worker in the namespace hivemr3.
After choosing CN, the user should create JKS files.
Below we illustrate the creation of a KeyStore file mr3-keystore.jks and a TrustStore file mr3-truststore.jks
with passwords key_password, keystore_password, and truststore_password.
# create a KeyStore
keytool -genkey -alias mr3-shuffle -keyalg RSA -keysize 2048 -dname "CN=*" -keypass key_password -keystore mr3-keystore.jks -storepass keystore_password -validity 3650
# extract the CSR (Certificate Signing Request) from the KeyStore
keytool -keystore mr3-keystore.jks -storepass keystore_password -alias mr3-shuffle -certreq -file mr3-shuffle.csr
# create a private key
openssl genrsa -out mr3.key 2048
# generate a CA certificate from the private key
openssl req -new -x509 -key mr3.key -out mr3.crt
# sign the certificate with the CA certificate
openssl x509 -req -in mr3-shuffle.csr -CA mr3.crt -CAkey mr3.key -CAcreateserial -out mr3-shuffle.crt
# import the certificate into the KeyStore
keytool -import -alias mr3-shuffle -file mr3-shuffle.crt -keystore mr3-shuffle.jks -storepass keystore_password
# create a TrustStore
keytool -importcert -alias mr3-shuffle -file mr3-shuffle.crt -keystore mr3-truststore.jks -storepass truststore_password
# check all the files
ls mr3*
mr3.crt mr3.key mr3-keystore.jks mr3-shuffle.crt mr3-shuffle.csr mr3-shuffle.jks mr3.srl mr3-truststore.jks
On Hadoop, copy mr3-keystore.jks and mr3-truststore.jks to a directory on HDFS (e.g., /user/hive/lib/).
On Kubernetes, copy mr3-keystore.jks and mr3-truststore.jks to the directory kubernetes/key in the MR3 release.
Change the permission if necessary.
Step 2. Update mr3-site.xml
On Hadoop, extend the configuration key mr3.aux.uris in mr3-site.xml
to include the path on HDFS where mr3-keystore.jks and mr3-truststore.jks reside.
vi hadoop/conf/tpcds/mr3-site.xml
<property>
<name>mr3.aux.uris</name>
<value>${auxuris},/user/hive/lib/mr3-keystore.jks,/user/hive/lib/mr3-truststore.jks</value>
</property>
On Kubernetes, set CREATE_KEYTAB_SECRET and CREATE_WORKER_SECRET to true in kubernetes/env.sh.
vi kubernetes/env.sh
CREATE_KEYTAB_SECRET=true
CREATE_WORKER_SECRET=true
Step 3. Update tez-site.xml
For updating tez-site.xml, the user should consider
whether the configuration key hadoop.security.credential.provider.path in core-site.xml is set to a JKS file or not.
If it is set, all passwords are retrieved from the JKS file,
so the user needs to set only the following configuration keys in tez-site.xml.
ssl.server.keystore.locationtomr3-keystore.jkson Hadoop and/opt/mr3-run/key/mr3-keystore.jkson Kubernetesssl.server.truststore.locationtomr3-truststore.jkson Hadoop and/opt/mr3-run/key/mr3-truststore.jkson Kubernetesssl.client.truststore.locationtomr3-truststore.jkson Hadoop and/opt/mr3-run/key/mr3-truststore.jkson Kubernetes.
If it is not set, all passwords should be provided in text, so the user needs to set the following configuration keys as well.
ssl.server.keystore.passwordto the KeyStore passwordssl.server.truststore.passwordto the TrustStore passwordssl.client.truststore.passwordto the TrustStore password
Finally the user should set the following configuration keys to enable secure shuffle.
vi hadoop/conf/tpcds/tez-site.xml
vi kubernetes/conf/tez-site.xml
<property>
<name>tez.runtime.shuffle.ssl.enable</name>
<value>true</value>
</property>
<property>
<name>tez.runtime.shuffle.keep-alive.enabled</name>
<value>true</value>
</property>