Running Metastore and HiveServer2

The following file and directories are relevant to running Metastore and HiveServer2 using Helm.

# resources for executing Helm
└── kubernetes
    └── helm
        └── hive
            └── values.yaml

# resources for running Metastore and HiveServer2
└── kubernetes
    ├── conf
    └── key

The file kubernetes/helm/hive/values.yaml defines the default values for the Helm chart. Typically the user creates another file (e.g., values-minikube.yaml or values-production.yaml) in order to override some of these default values. The user can update configuration files in the directory kubernetes/conf to control the behavior of Metastore and HiveServer2 (as well as DAGAppMaster and ContainerWorkers). The user should copy keytab files to the directory kubernetes/key.

Assuming that a new YAML file values-sample.yaml overrides the default values in kubernetes/helm/hive/values.yaml, the user can run Metastore and HiveServer2 with namespace hivemr3 as follows:

$ ln -s $(pwd)/kubernetes/conf/ kubernetes/helm/hive/conf
$ ln -s $(pwd)/kubernetes/key/ kubernetes/helm/hive/key
$ helm install --namespace hivemr3 kubernetes/helm/hive -f values-sample.yaml

Here the first two commands create symbolic links so that Helm can access the directories kubernetes/conf and kubernetes/key directly.

Enabling SSL

In order to enable SSL, the user should have a TrustStore file. Follow the instruction in Enabling SSL to perform the following:

  • Create a self-signed certificate for SSL by executing the script kubernetes/generate-hivemr3-ssl.sh, and set the environment variable MR3_SSL_KEYSTORE_PASSWORD in kubernetes/config-run.sh (see Self-signed certificate for SSL in Enabling SSL).
  • Configure Metastore to use SSL (see Metastore with SSL in Enabling SSL). Note that if connecting to the MySQL database does not use SSL, the user should not update the configuration key javax.jdo.option.ConnectionURL.
  • Configure HiveServer2 to use SSL (see HiveServer2 with SSL in Enabling SSL).

Then execute the script kubernetes/run-hive.sh with an argument --generate-truststore (which is mandatory) in order to create a KeyStore file kubernetes/key/hivemr3-ssl-certificate.jks.

$ pwd
/home/gla/mr3-run/kubernetes
$ ./run-hive.sh --generate-truststore

Creating Hive MR3 SSL certificates...

...
$ ls key/hivemr3-ssl-certificate.*
key/hivemr3-ssl-certificate.jceks  key/hivemr3-ssl-certificate.jks

Next set the following two environment variables in kubernetes/helm/hive/env-secret.sh to the password generated by kubernetes/generate-hivemr3-ssl.sh when creating a self-signed certificate for SSL.

$ vi kubernetes/helm/hive/env-secret.sh

HIVE_SERVER2_SSL_TRUSTSTOREPASS=4b41c3e6-7614-4d92-8a4b-d38b1a58831d
export HADOOP_CREDSTORE_PASSWORD=4b41c3e6-7614-4d92-8a4b-d38b1a58831d

Finally append HADOOP_CREDSTORE_PASSWORD to the values of the configuration keys mr3.am.launch.env and mr3.container.launch.env in kubernetes/conf/mr3-site.xml. Note that for the security purpose, the user should NOT write the password itself. Just appending the string suffices because MR3 automatically sets the environment variable by reading from the system environment.

$ vi kubernetes/conf/mr3-site.xml

<property>
  <name>mr3.am.launch.env</name>
  <value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/,HADOOP_CREDSTORE_PASSWORD</value>
</property>

<property>
  <name>mr3.container.launch.env</name>
  <value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mr3-run/hadoop/apache-hadoop/lib/native,HADOOP_CREDSTORE_PASSWORD</value>
</property>

As HiveServer2 runs with SSL enabled, Beeline should use its own KeyStore file that contains the self-signed certificate. See the instruction Running Beeline in Enabling SSL. The user should also set hive/createSecret to true in kubernetes/helm/hive/values.yaml as explained below.

Updating kubernetes/helm/hive/values.yaml

After updating the YAML file values-sample.yaml and the configuration files in kubernetes/conf, the user can use Helm in the same way to run Metastore and HiveServer2. Below we describe each part of kubernetes/helm/hive/values.yaml.

dir and name

This part specifies the directories inside Pods and the names of various Kubernetes objects. It is unnecessary to override the default values.

dir:
  base: "/opt/mr3-run"
  work: "/opt/mr3-run/hive"
  conf: "/opt/mr3-run/conf"
  keytab: "/opt/mr3-run/key"
  persistentVolumeClaim: "/opt/mr3-run/work-dir"

name:
  hive:
    hiveserver2: hivemr3-hiveserver2
    service: hiveserver2
    serviceAccount: hive-service-account
    configMap: hivemr3-conf-configmap
    secret: hivemr3-keytab-secret
    workerSecret: hivemr3-worker-secret
  metastore:
    service: metastore
  envSecret: env-secret
  amConfigMap: client-am-config
  persistentVolume: workdir-pv
  persistentVolumeClaim: workdir-pvc

docker for the Docker image

docker:
  image: 10.1.91.17:5000/hive5:latest
  user: hive
  imagePullPolicy: Always
  imagePullSecrets: 
  • docker/image specifies the full name of the Docker image including a tag.
  • docker/user should match the user specified in kubernetes/hive/Dockerfile when creating the Docker image.
  • docker/imagePullPolicy specifies the pull policy for the Docker image. For DAGAppMaster and ContainerWorker Pods, the user should set the configuration key mr3.k8s.pod.image.pull.policy in kubernetes/conf/mr3-site.xml.
  • docker/imagePullSecrets specifies the pull secret for the Docker image. It is set to empty if no pull secret is necessary. For DAGAppMaster and ContainerWorker Pods, the user should set the configuration key mr3.k8s.pod.image.pull.secrets in kubernetes/conf/mr3-site.xml.

create and metastore for Metastore

create:
  metastore: false

metastore:
  host: red0
  port: 9850

  databaseHost: indigo0
  databaseName: hive5mr3
  warehouseDir: hdfs://red0:8020/user/hive/warehouse
  dbType: mysql

  initSchema: false
  mountLib: false

  secureMode: true
  kerberosPrincipal: hive/red0@RED
  kerberosKeytab: "hive.service.keytab"
  
  resources:
    requests:
      cpu: 2
      memory: 16Gi
    limits:
      cpu: 2
      memory: 16Gi
  heapSize: 16384
  • create/metastore specifies whether or not to create a Metastore Pod. Set to false in order to use an external Metastore.

  • metastore/host specifies the host where the external Metastore is running. It is ignored if create/metastore is set to true, in which case it is replaced with the address (FQDN) hivemr3-metastore-0.metastore.{{namespace}}.svc.cluster.local.

  • metastore/databaseHost specifies the host where the database for Metastore is running, and corresponds to the environment variable HIVE_DATABASE_HOST in kubernetes/env.sh.
  • metastore/databaseName specifies the database name for Metastore, and corresponds to the environment variable HIVE_DATABASE_NAME in kubernetes/env.sh.
  • metastore/warehouseDir specifies the directory for the Hive warehouse, and corresponds to the environment variable HIVE_WAREHOUSE_DIR in kubernetes/env.sh.
  • metastore/dbType contains an argument to schemaTool for specifying the type of the database (not to be confused with the configuration key hive.metastore.db.type in hive-site.xml).
  • metastore/initSchema specifies whether or not to initialize schema in the database.
  • metastore/mountLib specifies whether or not to include the subdirectory /lib under the PersistentVolume to the classpath for Metastore. For example, the user can copy the MySQL connector jar file to the subdirectory /lib and set metastore/mountLib to true. Then Metastore uses the custom MySQL connector provided by the user.
  • metastore/secureMode specifies whether or not to use Kerberos for authentication in Metastore, and corresponds to the environment variable METASTORE_SECURE_MODE in kubernetes/env.sh.
  • metastore/kerberosPrincipal and metastore/kerberosKeytab specify the principal and keytab file for Metastore, and correspond to environment variables HIVE_METASTORE_KERBEROS_PRINCIPAL and HIVE_METASTORE_KERBEROS_KEYTAB in kubernetes/env.sh (or configuration keys hive.metastore.kerberos.principal and hive.metastore.kerberos.keytab.file in hive-site.xml).
  • metastore/resources specifies the resources to be allocated to a Metastore Pod.
  • metastore/heapSize specifies the Java heap size (in MB) for Metastore.

hive for HiveServer2

hive:
  port: 9852
  externalIp: 10.1.91.41
  replicas: 1
  amMode: kubernetes

  createSecret: true
  authentication: KERBEROS
  kerberosPrincipal: hive/red0@RED
  kerberosKeytab: "hive.service.keytab"
  tokenRenewalEnabled: false

  sslTruststore: hivemr3-ssl-certificate.jks
  sslTruststoreType: jks
  sslTruststorePass: 

  resources:
    requests:
      cpu: 2
      memory: 16Gi
    limits:
      cpu: 2
      memory: 16Gi
  heapSize: 16384
  • hive/port and hive/externalIp specify an address (port and host) for the Service for exposing HiveServer2 to the outside of the Kubernetes cluster. In particular, hive/externalIp should specify a public IP address with a valid host name. (The host name is necessary in order for Ranger to securely communicate with HiveServer2.)
  • hive/replicas specifies the number of HiveServer2 instances (all of which share a common DAGAppMaster).
  • hive/amNode specifies a DAGAppMaster mode (localthread for LocalThread mode, localprocess for LocalProcess mode, and kubernetes for Kubernetes mode).
  • hive/createSecret specifies whether or not to create a Secret from the files in the directory kubernetes/key. It should be set to true if Kerberos is used or SSL is enabled.
  • hive/authentication specifies the authentication option for HiveServer2: NONE, NOSASL, KERBEROS, LDAP, PAM, and CUSTOM. It corresponds to the environment variable HIVE_SERVER2_AUTHENTICATION in kubernetes/env.sh.
  • hive/kerberosPrincipal and hive/kerberosKeytab specify the principal and keytab file for HiveServer2, and correspond to environment variables HIVE_SERVER2_KERBEROS_PRINCIPAL and HIVE_SERVER2_KERBEROS_KEYTAB in kubernetes/env.sh.
  • hive/tokenRenewalEnabled specifies whether or not to renew Hive tokens inside DAGAppMaster and ContainerWorkers, and corresponds to the environment variable TOKEN_RENEWAL_HIVE_ENABLED in kubernetes/env.sh.
  • hive/sslTruststore and hive/sslTruststoreType specify a TrustStore file for HiveServer2.
  • hive/resources specifies the resources to be allocated to a HiveServer2 Pod.
  • hive/heapSize specifies the Java heap size (in MB) for HiveServer2.

hdfs for reading from secure HDFS

hdfs:
  userPrincipal: hive@RED
  userKeytab: hive.service.keytab
  tokenRenewalEnabled: true
  • hdfs/userPrincipal specifies the principal name to use when renewing HDFS tokens in DAGAppMaster and ContainerWorkers, and corresponds to the environment variable USER_PRINCIPAL in kubernetes/env.sh.
  • hdfs/userKeytab specifies the name of the keytab file which should be copied to the directory kubernetes/key by the user, and corresponds to the environment variable USER_KEYTAB in kubernetes/env.sh.
  • hdfs/tokenRenewalEnabled specifies whether or not to automatically renew HDFS tokens, and corresponds to the environment variable TOKEN_RENEWAL_HDFS_ENABLED in kubernetes/env.sh.

workDir for PersistentVolume

workDir:
  isNfs: true
  nfs:
    server: "10.1.91.17"
    path: "/work/nfs/mr3-run-work-dir"
  volumeSize: 10Gi
  volumeClaimSize: 10Gi
  storageClassName: ""
  volumeStr: 
  • workDir/isNfs specifies whether the PersistentVolume uses NFS or not.
  • workDir/nfs/server and workDir/nfs/path specify the address of the NFS server and the path exported by the NFS server (when workDir/isNfs is set to true).
  • workDir/volumeSize and workDir/volumeClaimSize specify the size of the PersistentVolume and the PersistentVolumeClaim.
  • workDir/storageClassName specifies the StorageClass of the PersistentVolume.
  • workDir/volumeStr specifies the PersistentVolume to use when workDir/isNfs is set to false. For example, volumeStr: "hostPath:\n path: /work/nfs/mr3-run-work-dir" creates a hostPath PersistentVolume.

amConfig for environment variables

amConfig:
  key:
  timestamp:
  mr3SessionId:
  atsSecretKey:

These fields define environment variables which are read by HiveServer2 and DAGAppMaster. For an empty field, a random value is generated.

  • amConfig/key sets the environment variable CLIENT_TO_AM_TOKEN_KEY (see Using MasterControl).
  • amConfig/timestamp sets the environment variable MR3_APPLICATION_ID_TIMESTAMP which determines the name of the DAGAppMaster Pod.
  • amConfig/mr3SessionId sets the environment variable MR3_SHARED_SESSION_ID (see Multiple HiveServer2 Instances Sharing MR3).
  • amConfig/atsSecretKey sets the environment variable ATS_SECRET_KEY (see Running Timeline Server).

logLevel and hostAliases

logLevel: INFO

hostAliases:
- ip: "10.1.90.9"
  hostnames:
  - "gold0"
- ip: "10.1.91.4"
  hostnames:
  - "red0"
- ip: "10.1.91.41"
  hostnames:
  - "indigo20"
  • logLevel specifies the logging level.
  • hostAliases lists host aliases.