This page explains additional steps for using Kerberos for authentication in Hive/Spark on MR3.

basicsEnv: basics.T

In our example, the host alias for HiveServer2 is set to orange1 which serves as the instance in a Kerberos keytab for Hive.

  hiveserver2IpHostname: "orange1",

We provide the details of the Kerberos server: 1) Kerberos realm which is PL in our example; 2) IP address of the Kerberos admin server; 3) IP address of the KDC (Key Distribution Center).

  kerberos: {                                                                           
    realm: "PL",                                                                        
    adminServer: "1.1.1.1",                                                             
    kdc: "1.1.1.1"                                                                      
  },                                  

hiveEnv: hive.T

authentication is set to KERBEROS to use Kerberos for authentication.

  authentication: "KERBEROS",                                                           

secretEnv: secret.T

With the host alias set to orange1 and the Kerberos realm set to PL, the user should provide several keytab files. The user may choose any names for keytab files. In our example, we choose the name of a keytab file according to its corresponding principal.

Hive requires service principals for 1) public HiveServer2 and 2) internal HiveServer2 (which is not exposed to the outside of the Kubernetes cluster). The service principal for public HiveServer2 is uniquely determined by the host alias for HiveServer2 and the Kerberos realm.

  • hive/orange1@PL in hive-orange1.keytab

The service principal for internal HiveServer2 is determined by the Kerberos realm alone.

  • hive/hiveserver2-internal.hivemr3.svc.cluster.local@PL in hive-hiveserver2-internal.hivemr3.svc.cluster.local.keytab

Ranger requires two service principals and a user principal. All the principals are uniquely determined by the Kerberos realm.

  • HTTP/ranger.hivemr3.svc.cluster.local@PL in HTTP-ranger.hivemr3.svc.cluster.local.keytab (Spnego service principal)
  • rangeradmin/ranger.hivemr3.svc.cluster.local@PL in rangeradmin-ranger.hivemr3.svc.cluster.local.keytab (admin service principal)
  • rangerlookup@PL in rangerlookup.keytab (lookup principal)

Spark requires a user principal. The user may choose any primary in the principal (not necessarily spark).

  • spark@PL in spark.keytab

Optionally the user may use a user principal for accessing Kerberized HDFS from Hive on MR3. In our example, we use hive@PL, but the user may choose any primary in the principal.

  • hive@PL in hive.keytab

Then set kerberosSecret and spark fields as follows:

  kerberosSecret: {
    server: {
      keytab: "hive-orange1.keytab",
      principal: "hive/orange1@PL",
      data: fs.readFileSync("hive-orange1.keytab").toString("base64"),
      keytabInternal: "hive-hiveserver2-internal.hivemr3.svc.cluster.local.keytab",
      principalInternal: "hive/hiveserver2-internal.hivemr3.svc.cluster.local@PL",
      dataInternal: fs.readFileSync("hive-hiveserver2-internal.hivemr3.svc.cluster.local
    },
    user: {
      keytab: "hive.keytab",
      principal: "hive@PL",
      data: fs.readFileSync("hive.keytab").toString("base64")
    },
    ranger: {
      spnego: {
        keytab: "HTTP-ranger.hivemr3.svc.cluster.local.keytab",
        principal: "HTTP/ranger.hivemr3.svc.cluster.local@PL",
        data: fs.readFileSync("HTTP-ranger.hivemr3.svc.cluster.local.keytab").toString("
      },
      admin: {
        keytab: "rangeradmin-ranger.hivemr3.svc.cluster.local.keytab",
        principal: "rangeradmin/ranger.hivemr3.svc.cluster.local@PL",
        data: fs.readFileSync("rangeradmin-ranger.hivemr3.svc.cluster.local.keytab").toS
      },
      lookup: {
        keytab: "rangerlookup.keytab",
        principal: "rangerlookup@PL",
        data: fs.readFileSync("rangerlookup.keytab").toString("base64")
      }
    }
  },
  spark: {
    keytab: "spark.keytab",
    principal: "spark@PL",
    data: fs.readFileSync("spark.keytab").toString("base64")
  },

Configuring Ranger

In the Ranger service, fill the JDBC URL field with:

  • jdbc:hive2://hiveserver2.hivemr3.svc.cluster.local:9852/;principal=hive/orange1@PL;

Configuring Superset

When registering a database source, the Hive URI should be:

  • hive://hiveserver2-internal.hivemr3.svc.cluster.local:9852/default?auth=KERBEROS&kerberos_service_name=hive

Running queries

For sending queries to the public HiveServer2, the user should obtain a valid Kerberos ticket and use JDBC URL:

  • jdbc:hive2://orange1:9852/;principal=hive/orange1@PL;

Running Spark on MR3

Inside a Spark driver Pod, the user should obtain a Kerberos ticket from spark.keytab (mounted in the directory /opt/mr3-run/key) before running Spark shell or submitting Spark jobs.

$ kubectl exec -n hivemr3 -it spark1 -- /bin/bash
spark@spark1:/opt/mr3-run/spark$ kinit -kt ../key/spark.keytab spark@PL
spark@spark1:/opt/mr3-run/spark$ ./run-spark-shell.sh 
...