This page explains additional steps for using SSL (Secure Sockets Layer) encryption in several parts of Hive on MR3.

basicsEnv: basics.T

We use secure connection to S3-compatible storage with HTTPS.

  s3aEndpoint: "https://orange0:9000",
  s3aEnableSsl: true,

The user should have a certificate for connecting to the storage.

metastoreEnv: metastore.T

We store the password of the MySQL server for Metastore in a KeyStore file to be created later. Internally the configuration key javax.jdo.option.ConnectionPassword in kubernetes/conf/hive-site.xml in hive-site.xml is set to _.

  userName: "root",
  password: "_",

hiveEnv: hive.T

We enable secure connection to the public HiveServer2.

  enableSsl: true,                                                                      

Setting enableSsl to true does not enable secure connection to the internal HiveServer2, Metastore, and Ranger, which all run only inside the Kubernetes cluster. To enable secure connection to these components as well (which is usually unnecessary, e.g., because all these components run on the same node), the user should update the source code:

$ vi typescript/src/server/api/hive.ts

export interface T {
...
  enableSslInternal: true;

$ vi typescript/src/server/validate/hive.ts

export function initial(): T {
...
    enableSslInternal: false
In the current implementation of Superset, connecting securely to the internal HiveServer2 does not work.

workerEnv: worker.T

We enable secure shuffle in MR3 using SSL mode. Then all the ContainerWorker Pods for Hive (but not for Spark) communicate securely.

  enableShuffleSsl: true

Enabling secure shuffle is usually unnecessary because ContainerWorker Pods are not reachable from the outside of the Kubernetes cluster. Beside it incurs a noticeable performance overhead.

Creating certificates and secrets

Using SSL encryption requires several certificates and secrets (TrustStores and KeyStores). While it is feasible to create them manually (see Enabling SSL), the user can use the script generate-ssl.sh included in the MR3 release.

The script has the following requirements:

  • Java 1.8 or higher
  • Hadoop binary distribution for executing hadoop credentials
  • Java keytool
  • openssl

The environment variables JAVA_HOME and HADOOP_HOME should be set before executing the script.

$ echo $JAVA_HOME
/usr/lib/jdk1.8.0_231/
$ echo $HADOOP_HOME
/home/hive/hadoop-3.1.2
$ which keytool
/usr/bin/keytool
$ which openssl
/usr/bin/openssl

At minimum, the user should set the following variables in the script.

  • NAMESPACE: namespace of the Kubernetes cluster, i.e., namespace in basicsEnv
  • HOST: alias for the host name the public HiveServer2, i.e., hiveserver2IpHostname in basicsEnv
  • VALID_DAYS: period (in days) in which KeyStore and TrustStore files remain valid
  • BEELINE_KEYSTORE_PASSWORD: password for the Beeline KeyStore (beeline-ssl.jks) to be distributed to end users
$ vi generate-ssl.sh

NAMESPACE=hivemr3
HOST=orange1
VALID_DAYS=365
BEELINE_KEYSTORE_PASSWORD=beelinepassword

The following optional variables are set in our example.

  • PASSWORD: password for KeyStores and TrustStores. If not set, the script uses a random string for the password.
  • METASTORE_DATABASE_PASSWORD: password for the MySQL server for Metastore
  • S3_CERTIFICATE: certificate for connecting to S3-compatible storage
  • COMMON_NAME: instance in the Kerberos service principal for HiveServer2, i.e., hiveserver2IpHostname in basicsEnv. This is required for using Python clients when connecting to the public HiveServer2.
$ vi generate-ssl.sh

PASSWORD=MySslPassword123
METASTORE_DATABASE_PASSWORD=passwd
S3_CERTIFICATE=s3-public.cert
COMMON_NAME=orange1

The following variables should be set if the connection to the Metastore database is secure. In our example, the connection is not secure. See Enabling SSL for more details.

  • METASTORE_DATABASE_HOST: host name for the database server
  • METASTORE_MYSQL_CERTIFICATE: certificate for connecting to the database server
$ vi generate-ssl.sh

METASTORE_DATABASE_HOST=
METASTORE_MYSQL_CERTIFICATE=

The following variables should be set if the connection to the database server for Ranger is secure. In our example, the connection is not secure. See Enabling SSL for more details.

  • RANGER_DATABASE_HOST: host name for the database server
  • RANGER_MYSQL_CERTIFICATE: certificate for connecting to the database server
$ vi generate-ssl.sh

RANGER_DATABASE_HOST=
RANGER_MYSQL_CERTIFICATE=

Executing the script generates several files. The user may use empty strings for input.

$ ls s3-public.cert
s3-public.cert
$ ./generate-ssl.sh
...
Country Name (2 letter code) [AU]:
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:
Organization Name (eg, company) [Internet Widgits Pty Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:
Email Address []:
...
Trust this certificate? [no]:  yes
...
Trust this certificate? [no]:  yes
...
Country Name (2 letter code) [AU]:
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:
Organization Name (eg, company) [Internet Widgits Pty Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:
Email Address []:

The user can find the following output files.

  • hivemr3-ssl-certificate.jceks and hivemr3-ssl-certificate.jks are KeyStore and TrustStore for Hive on MR3.
  • mr3-keystore.jks and mr3-truststore.jks are KeyStore and TrustStore for secure shuffle.
  • beeline-ssl.jks is KeyStore to be distributed to end users running Beeline to connect to the public HiveServer2. Its password is specified in BEELINE_KEYSTORE_PASSWORD in the script.
  • mr3-ssl.pem can be used to update the certificate of the Metastore and Ranger databases.

secretEnv: secret.T

We set ssl and shuffleSsl fields using the output files of generate-ssl.sh and the password set in PASSWORD.

  ssl: {
    keystore: "hivemr3-ssl-certificate.jceks",
    truststore: "hivemr3-ssl-certificate.jks",
    password: "MySslPassword123",
    keystoreData: fs.readFileSync("hivemr3-ssl-certificate.jceks").toString("base64"),
    truststoreData: fs.readFileSync("hivemr3-ssl-certificate.jks").toString("base64")
  },
  shuffleSsl: {
    keystore: "mr3-keystore.jks",
    truststore: "mr3-truststore.jks",
    keystoreData: fs.readFileSync("mr3-keystore.jks").toString("base64"),
    truststoreData: fs.readFileSync("mr3-truststore.jks").toString("base64")
  },

Configuring Ranger

In the Ranger service, fill the JDBC URL field with:

  • jdbc:hive2://hiveserver2-internal.hivemr3.svc.cluster.local:9852/;principal=hive/hiveserver2-internal.hivemr3.svc.cluster.local@PL;

Note that we use the internal HiveServer2 which does not use secure connection by default.

Running queries

For sending queries to the public HiveServer2, the user should use JDBC URL:

  • jdbc:hive2://orange1:9852/;principal=hive/orange1@PL;ssl=true;sslTrustStore=/path/to/beeline-ssl.jks;trustStorePassword=beelinepassword;