In order to run Spark on MR3 with Kerberos, the user should have a principal name and a keytab file containing valid Kerberos credentials. Since token renewal is managed by the Spark driver, the keytab file is not distributed to DAGAppMaster and ContainerWorker Pods. Hence Spark on MR3 is simpler to run with Kerberos than Hive on MR3.
Spark on MR3 on Kubernetes
In our example, we make the following assumptions:
- The principal name is
spark@RED
. - The keytab file is
spark.keytab
. - The KDC server runs on a host
red0
with IP address10.1.91.4
. - Spark on MR3 accesses an HDFS file system
hdfs://red0:8020
.
Before running Spark on MR3 on Kubernetes with Kerberos,
copy the keytab file spark.keytab
to the directory kubernetes/spark/key
and the Kerberos configuration file krb5.conf
to the directory kubernetes/spark/conf
.
$ ls kubernetes/spark/key
spark.keytab
$ ls kubernetes/spark/conf/krb5.conf
kubernetes/spark/conf/krb5.conf
Then update the file kubernetes/spark/conf/core-site.xml
.
$ vi kubernetes/spark/conf/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>file:///</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
fs.defaultFS
should be setfile:///
, not an HDFS address likehdfs://red0:8020
. This is because from the viewpoint of Spark on MR3 running in a Kubernetes cluster, the default file system is the local file system.hadoop.security.authentication
should be set tokerberos
to enable Kerberos for authentication.
Optionally update the file kubernetes/spark/conf/spark-defaults.conf
.
$ vi kubernetes/spark/conf/spark-defaults.conf
spark.kerberos.access.hadoopFileSystems=hdfs://red0:8020
spark.kerberos.access.hadoopFileSystems
lists HDFS file systems that Spark on MR3 accesses.
The remaining configuration files can be updated as explained below. After updating all the configuration files, see Running Spark on MR3 on Kubernetes.
Option 1. Running the Spark driver inside Kubernetes
kubernetes/spark/env.sh
$ vi kubernetes/spark/env.sh
CREATE_KEYTAB_SECRET=true
SPARK_KERBEROS_KEYTAB=$KEYTAB_MOUNT_DIR/spark.keytab
SPARK_KERBEROS_PRINCIPAL=spark@RED
SPARK_KERBEROS_USER=spark
CREATE_KEYTAB_SECRET
should be set to true so that a Secret is created from the files in the directorykubernetes/spark/key
.SPARK_KERBEROS_KEYTAB
specifies the path to the keytab file inside the Spark driver Pod.SPARK_KERBEROS_PRINCIPAL
specifies the principal name.SPARK_KERBEROS_USER
specifies the user name derived from the principal name.
kubernetes/spark/conf/mr3-site.xml
$ vi kubernetes/spark/conf/mr3-site.xml
<property>
<name>mr3.k8s.host.aliases</name>
<value>red0=10.1.91.4</value>
</property>
mr3.k8s.host.aliases
should include a mapping for the KDC server (as well as the HDFS NameNode).
kubernetes/spark-yaml/spark-run.yaml
$ vi kubernetes/spark-yaml/spark-run.yaml
spec:
hostAliases:
- ip: "10.1.91.4"
hostnames:
- "red0"
- The
spec/hostAliases
field should include aliases for hosts for the HDFS NameNode and the KDC server.
Option 2. Running the Spark driver outside Kubernetes
kubernetes/spark/env.sh
$ vi kubernetes/spark/env.sh
CREATE_KEYTAB_SECRET=false
SPARK_KERBEROS_KEYTAB=/home/spark/mr3-run/kubernetes/spark/key/spark.keytab
SPARK_KERBEROS_PRINCIPAL=spark@RED
SPARK_KERBEROS_USER=spark
CREATE_KEYTAB_SECRET
can be set to false because the keytab file is not used by DAGAppMaster and ContainerWorker Pods.SPARK_KERBEROS_KEYTAB
specifies the path to the keytab file on the node where the Spark driver runs. This is because the Spark driver directly reads the keytab file on the local file system.- The other environment variables are set in the same way as in Option 1.
kubernetes/spark/conf/mr3-site.xml
$ vi kubernetes/spark/conf/mr3-site.xml
<property>
<name>mr3.k8s.host.aliases</name>
<value>red0=10.1.91.4,gold0=10.1.90.9</value>
</property>
mr3.k8s.host.aliases
should include a mapping for the KDC server (as well as the HDFS NameNode). In the example shown above,gold0
is the node where the Spark driver is to run.
Spark on MR3 on Hadoop
In order to run Spark on MR3 on Hadoop with Kerberos,
the user should update spark-defaults.conf
(such as conf/cluster/spark/spark-defaults.conf
).
$ vi conf/cluster/spark/spark-defaults.conf
spark.kerberos.principal=spark@RED
spark.kerberos.keytab=/home/spark/spark.keytab
spark.hadoop.dfs.namenode.kerberos.principal=nn/red0@RED
spark.kerberos.principal
specifies the principal name.spark.kerberos.keytab
specifies the path to the keytab file.spark.hadoop.dfs.namenode.kerberos.principal
specifies the principal for the HDFS NameNode.
Note that the environment variables in env.sh
(such as SECURE_MODE
, USER_PRINCIPAL
, USER_KEYTAB
) are irrelevant
because token renewal is managed by the Spark driver.
Hence even TOKEN_RENEWAL_HDFS_ENABLED
can be set to false in env.sh
.