This page explains additional steps for using Kerberos for authentication in Hive/Spark on MR3.
basicsEnv: basics.T
In our example,
the host alias for HiveServer2 is set to orange1
which serves as the instance in a Kerberos keytab for Hive.
hiveserver2IpHostname: "orange1",
We provide the details of the Kerberos server: 1) Kerberos realm which is PL
in our example;
2) IP address of the Kerberos admin server;
3) IP address of the KDC (Key Distribution Center).
kerberos: {
realm: "PL",
adminServer: "1.1.1.1",
kdc: "1.1.1.1"
},
hiveEnv: hive.T
authentication is set to KERBEROS
to use Kerberos for authentication.
authentication: "KERBEROS",
secretEnv: secret.T
With the host alias set to orange1
and the Kerberos realm set to PL
,
the user should provide several keytab files.
The user may choose any names for keytab files.
In our example,
we choose the name of a keytab file according to its corresponding principal.
Hive requires service principals for 1) public HiveServer2 and 2) internal HiveServer2 (which is not exposed to the outside of the Kubernetes cluster). The service principal for public HiveServer2 is uniquely determined by the host alias for HiveServer2 and the Kerberos realm.
hive/orange1@PL
inhive-orange1.keytab
The service principal for internal HiveServer2 is determined by the Kerberos realm alone.
hive/hiveserver2-internal.hivemr3.svc.cluster.local@PL
inhive-hiveserver2-internal.hivemr3.svc.cluster.local.keytab
Ranger requires two service principals and a user principal. All the principals are uniquely determined by the Kerberos realm.
HTTP/ranger.hivemr3.svc.cluster.local@PL
inHTTP-ranger.hivemr3.svc.cluster.local.keytab
(Spnego service principal)rangeradmin/ranger.hivemr3.svc.cluster.local@PL
inrangeradmin-ranger.hivemr3.svc.cluster.local.keytab
(admin service principal)rangerlookup@PL
inrangerlookup.keytab
(lookup principal)
Spark requires a user principal.
The user may choose any primary in the principal (not necessarily spark
).
spark@PL
inspark.keytab
Optionally the user may use a user principal for accessing Kerberized HDFS
from Hive on MR3.
In our example, we use hive@PL
, but
the user may choose any primary in the principal.
hive@PL
inhive.keytab
Then set kerberosSecret and spark fields as follows:
kerberosSecret: {
server: {
keytab: "hive-orange1.keytab",
principal: "hive/orange1@PL",
data: fs.readFileSync("hive-orange1.keytab").toString("base64"),
keytabInternal: "hive-hiveserver2-internal.hivemr3.svc.cluster.local.keytab",
principalInternal: "hive/hiveserver2-internal.hivemr3.svc.cluster.local@PL",
dataInternal: fs.readFileSync("hive-hiveserver2-internal.hivemr3.svc.cluster.local
},
user: {
keytab: "hive.keytab",
principal: "hive@PL",
data: fs.readFileSync("hive.keytab").toString("base64")
},
ranger: {
spnego: {
keytab: "HTTP-ranger.hivemr3.svc.cluster.local.keytab",
principal: "HTTP/ranger.hivemr3.svc.cluster.local@PL",
data: fs.readFileSync("HTTP-ranger.hivemr3.svc.cluster.local.keytab").toString("
},
admin: {
keytab: "rangeradmin-ranger.hivemr3.svc.cluster.local.keytab",
principal: "rangeradmin/ranger.hivemr3.svc.cluster.local@PL",
data: fs.readFileSync("rangeradmin-ranger.hivemr3.svc.cluster.local.keytab").toS
},
lookup: {
keytab: "rangerlookup.keytab",
principal: "rangerlookup@PL",
data: fs.readFileSync("rangerlookup.keytab").toString("base64")
}
}
},
spark: {
keytab: "spark.keytab",
principal: "spark@PL",
data: fs.readFileSync("spark.keytab").toString("base64")
},
Configuring Ranger
In the Ranger service, fill the JDBC URL field with:
jdbc:hive2://hiveserver2.hivemr3.svc.cluster.local:9852/;principal=hive/orange1@PL;
Configuring Superset
When registering a database source, the Hive URI should be:
hive://hiveserver2-internal.hivemr3.svc.cluster.local:9852/default?auth=KERBEROS&kerberos_service_name=hive
Running queries
For sending queries to the public HiveServer2, the user should obtain a valid Kerberos ticket and use JDBC URL:
jdbc:hive2://orange1:9852/;principal=hive/orange1@PL;
Running Spark on MR3
Inside a Spark driver Pod,
the user should obtain a Kerberos ticket from spark.keytab
(mounted in the directory /opt/mr3-run/key
)
before running Spark shell or submitting Spark jobs.
$ kubectl exec -n hivemr3 -it spark1 -- /bin/bash
spark@spark1:/opt/mr3-run/spark$ kinit -kt ../key/spark.keytab spark@PL
spark@spark1:/opt/mr3-run/spark$ ./run-spark-shell.sh
...