Running Metastore and HiveServer2
The following file and directories are relevant to running Metastore and HiveServer2 using Helm.
# resources for executing Helm
└── kubernetes
└── helm
└── hive
└── values.yaml
# resources for running Metastore and HiveServer2
└── kubernetes
├── conf
└── key
The file kubernetes/helm/hive/values.yaml
defines the default values for the Helm chart.
Typically the user creates another file (e.g., values-minikube.yaml
or values-production.yaml
)
in order to override some of these default values.
The user can update configuration files in the directory kubernetes/conf
to control the behavior
of Metastore and HiveServer2 (as well as DAGAppMaster and ContainerWorkers).
The user should copy keytab files to the directory kubernetes/key
.
Assuming that
a new YAML file values-sample.yaml
overrides the default values in kubernetes/helm/hive/values.yaml
,
the user can run Metastore and HiveServer2 with namespace hivemr3
as follows:
$ ln -s $(pwd)/kubernetes/conf/ kubernetes/helm/hive/conf
$ ln -s $(pwd)/kubernetes/key/ kubernetes/helm/hive/key
$ helm install --namespace hivemr3 kubernetes/helm/hive -f values-sample.yaml
Here the first two commands create symbolic links so that Helm can access
the directories kubernetes/conf
and kubernetes/key
directly.
Enabling SSL
In order to enable SSL, the user should have a TrustStore file. Follow the instruction in Enabling SSL to perform the following:
- Create a self-signed certificate for SSL by executing the script
kubernetes/generate-hivemr3-ssl.sh
, and set the environment variableMR3_SSL_KEYSTORE_PASSWORD
inkubernetes/config-run.sh
(see Self-signed certificate for SSL in Enabling SSL). - Configure Metastore to use SSL (see Metastore with SSL in Enabling SSL).
Note that if connecting to the MySQL database does not use SSL,
the user should not update the configuration key
javax.jdo.option.ConnectionURL
. - Configure HiveServer2 to use SSL (see HiveServer2 with SSL in Enabling SSL).
Then execute the script kubernetes/run-hive.sh
with an argument --generate-truststore
(which is mandatory) in order to create a KeyStore file kubernetes/key/hivemr3-ssl-certificate.jks
.
$ pwd
/home/gla/mr3-run/kubernetes
$ ./run-hive.sh --generate-truststore
Creating Hive MR3 SSL certificates...
...
$ ls key/hivemr3-ssl-certificate.*
key/hivemr3-ssl-certificate.jceks key/hivemr3-ssl-certificate.jks
Next
set the following two environment variables in kubernetes/helm/hive/env-secret.sh
to the password generated by kubernetes/generate-hivemr3-ssl.sh
when creating a self-signed certificate for SSL.
$ vi kubernetes/helm/hive/env-secret.sh
HIVE_SERVER2_SSL_TRUSTSTOREPASS=4b41c3e6-7614-4d92-8a4b-d38b1a58831d
export HADOOP_CREDSTORE_PASSWORD=4b41c3e6-7614-4d92-8a4b-d38b1a58831d
Finally append HADOOP_CREDSTORE_PASSWORD
to the values of the configuration keys mr3.am.launch.env
and mr3.container.launch.env
in kubernetes/conf/mr3-site.xml
.
Note that for the security purpose, the user should NOT write the password itself.
Just appending the string suffices because MR3 automatically sets the environment variable by reading from the system environment.
$ vi kubernetes/conf/mr3-site.xml
<property>
<name>mr3.am.launch.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/,HADOOP_CREDSTORE_PASSWORD</value>
</property>
<property>
<name>mr3.container.launch.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mr3-run/hadoop/apache-hadoop/lib/native,HADOOP_CREDSTORE_PASSWORD</value>
</property>
As HiveServer2 runs with SSL enabled,
Beeline should use its own KeyStore file that contains the self-signed certificate.
See the instruction Running Beeline in Enabling SSL.
The user should also set hive/createSecret
to true in kubernetes/helm/hive/values.yaml
as explained below.
Updating kubernetes/helm/hive/values.yaml
After updating the YAML file values-sample.yaml
and the configuration files in kubernetes/conf
,
the user can use Helm in the same way to run Metastore and HiveServer2.
Below we describe each part of kubernetes/helm/hive/values.yaml
.
dir
and name
This part specifies the directories inside Pods and the names of various Kubernetes objects. It is unnecessary to override the default values.
dir:
base: "/opt/mr3-run"
work: "/opt/mr3-run/hive"
conf: "/opt/mr3-run/conf"
keytab: "/opt/mr3-run/key"
persistentVolumeClaim: "/opt/mr3-run/work-dir"
name:
hive:
hiveserver2: hivemr3-hiveserver2
service: hiveserver2
serviceAccount: hive-service-account
configMap: hivemr3-conf-configmap
secret: hivemr3-keytab-secret
workerSecret: hivemr3-worker-secret
metastore:
service: metastore
envSecret: env-secret
amConfigMap: client-am-config
persistentVolume: workdir-pv
persistentVolumeClaim: workdir-pvc
docker
for the Docker image
docker:
image: 10.1.91.17:5000/hive3:latest
user: hive
imagePullPolicy: Always
imagePullSecrets:
docker/image
specifies the full name of the Docker image including a tag.docker/user
should match the user specified inkubernetes/hive/Dockerfile
when creating the Docker image.docker/imagePullPolicy
specifies the pull policy for the Docker image. For DAGAppMaster and ContainerWorker Pods, the user should set the configuration keymr3.k8s.pod.image.pull.policy
inkubernetes/conf/mr3-site.xml
.docker/imagePullSecrets
specifies the pull secret for the Docker image. It is set to empty if no pull secret is necessary. For DAGAppMaster and ContainerWorker Pods, the user should set the configuration keymr3.k8s.pod.image.pull.secrets
inkubernetes/conf/mr3-site.xml
.
create
and metastore
for Metastore
create:
metastore: false
metastore:
host: red0
port: 9850
databaseHost: indigo0
databaseName: hive3mr3
warehouseDir: hdfs://red0:8020/user/hive/warehouse
dbType: mysql
initSchema: false
mountLib: false
secureMode: true
kerberosPrincipal: hive/red0@RED
kerberosKeytab: "hive.service.keytab"
resources:
requests:
cpu: 2
memory: 16Gi
limits:
cpu: 2
memory: 16Gi
heapSize: 16384
-
create/metastore
specifies whether or not to create a Metastore Pod. Set to false in order to use an external Metastore. -
metastore/host
specifies the host where the external Metastore is running. It is ignored ifcreate/metastore
is set to true, in which case it is replaced with the address (FQDN)hivemr3-metastore-0.metastore.{{namespace}}.svc.cluster.local
.
metastore/databaseHost
specifies the host where the database for Metastore is running, and corresponds to the environment variableHIVE_DATABASE_HOST
inkubernetes/env.sh
.metastore/databaseName
specifies the database name for Metastore, and corresponds to the environment variableHIVE_DATABASE_NAME
inkubernetes/env.sh
.metastore/warehouseDir
specifies the directory for the Hive warehouse, and corresponds to the environment variableHIVE_WAREHOUSE_DIR
inkubernetes/env.sh
.metastore/dbType
contains an argument to schemaTool for specifying the type of the database (not to be confused with the configuration keyhive.metastore.db.type
inhive-site.xml
).
metastore/initSchema
specifies whether or not to initialize schema in the database.metastore/mountLib
specifies whether or not to include the subdirectory/lib
under the PersistentVolume to the classpath for Metastore. For example, the user can copy the MySQL connector jar file to the subdirectory/lib
and setmetastore/mountLib
to true. Then Metastore uses the custom MySQL connector provided by the user.
metastore/secureMode
specifies whether or not to use Kerberos for authentication in Metastore, and corresponds to the environment variableMETASTORE_SECURE_MODE
inkubernetes/env.sh
.metastore/kerberosPrincipal
andmetastore/kerberosKeytab
specify the principal and keytab file for Metastore, and correspond to environment variablesHIVE_METASTORE_KERBEROS_PRINCIPAL
andHIVE_METASTORE_KERBEROS_KEYTAB
inkubernetes/env.sh
(or configuration keyshive.metastore.kerberos.principal
andhive.metastore.kerberos.keytab.file
inhive-site.xml
).
metastore/resources
specifies the resources to be allocated to a Metastore Pod.metastore/heapSize
specifies the Java heap size (in MB) for Metastore.
hive
for HiveServer2
hive:
port: 9852
externalIp: 10.1.91.41
replicas: 1
amMode: kubernetes
createSecret: true
authentication: KERBEROS
kerberosPrincipal: hive/red0@RED
kerberosKeytab: "hive.service.keytab"
tokenRenewalEnabled: false
sslTruststore: hivemr3-ssl-certificate.jks
sslTruststoreType: jks
sslTruststorePass:
resources:
requests:
cpu: 2
memory: 16Gi
limits:
cpu: 2
memory: 16Gi
heapSize: 16384
hive/port
andhive/externalIp
specify an address (port and host) for the Service for exposing HiveServer2 to the outside of the Kubernetes cluster. In particular,hive/externalIp
should specify a public IP address with a valid host name. (The host name is necessary in order for Ranger to securely communicate with HiveServer2.)hive/replicas
specifies the number of HiveServer2 instances (all of which share a common DAGAppMaster).hive/amNode
specifies a DAGAppMaster mode (localthread
for LocalThread mode,localprocess
for LocalProcess mode, andkubernetes
for Kubernetes mode).
hive/createSecret
specifies whether or not to create a Secret from the files in the directorykubernetes/key
. It should be set to true if Kerberos is used or SSL is enabled.hive/authentication
specifies the authentication option for HiveServer2: NONE, NOSASL, KERBEROS, LDAP, PAM, and CUSTOM. It corresponds to the environment variableHIVE_SERVER2_AUTHENTICATION
inkubernetes/env.sh
.hive/kerberosPrincipal
andhive/kerberosKeytab
specify the principal and keytab file for HiveServer2, and correspond to environment variablesHIVE_SERVER2_KERBEROS_PRINCIPAL
andHIVE_SERVER2_KERBEROS_KEYTAB
inkubernetes/env.sh
.hive/tokenRenewalEnabled
specifies whether or not to renew Hive tokens inside DAGAppMaster and ContainerWorkers, and corresponds to the environment variableTOKEN_RENEWAL_HIVE_ENABLED
inkubernetes/env.sh
.
hive/sslTruststore
andhive/sslTruststoreType
specify a TrustStore file for HiveServer2.
hive/resources
specifies the resources to be allocated to a HiveServer2 Pod.hive/heapSize
specifies the Java heap size (in MB) for HiveServer2.
hdfs
for reading from secure HDFS
hdfs:
userPrincipal: hive@RED
userKeytab: hive.service.keytab
tokenRenewalEnabled: true
hdfs/userPrincipal
specifies the principal name to use when renewing HDFS tokens in DAGAppMaster and ContainerWorkers, and corresponds to the environment variableUSER_PRINCIPAL
inkubernetes/env.sh
.hdfs/userKeytab
specifies the name of the keytab file which should be copied to the directorykubernetes/key
by the user, and corresponds to the environment variableUSER_KEYTAB
inkubernetes/env.sh
.hdfs/tokenRenewalEnabled
specifies whether or not to automatically renew HDFS tokens, and corresponds to the environment variableTOKEN_RENEWAL_HDFS_ENABLED
inkubernetes/env.sh
.
workDir
for PersistentVolume
workDir:
isNfs: true
nfs:
server: "10.1.91.17"
path: "/work/nfs/mr3-run-work-dir"
volumeSize: 10Gi
volumeClaimSize: 10Gi
storageClassName: ""
volumeStr:
workDir/isNfs
specifies whether the PersistentVolume uses NFS or not.workDir/nfs/server
andworkDir/nfs/path
specify the address of the NFS server and the path exported by the NFS server (whenworkDir/isNfs
is set to true).workDir/volumeSize
andworkDir/volumeClaimSize
specify the size of the PersistentVolume and the PersistentVolumeClaim.workDir/storageClassName
specifies the StorageClass of the PersistentVolume.workDir/volumeStr
specifies the PersistentVolume to use whenworkDir/isNfs
is set to false. For example,volumeStr: "hostPath:\n path: /work/nfs/mr3-run-work-dir"
creates a hostPath PersistentVolume.
amConfig
for environment variables
amConfig:
key:
timestamp:
mr3SessionId:
atsSecretKey:
These fields define environment variables which are read by HiveServer2 and DAGAppMaster. For an empty field, a random value is generated.
amConfig/key
sets the environment variableCLIENT_TO_AM_TOKEN_KEY
(see Using MasterControl).amConfig/timestamp
sets the environment variableMR3_APPLICATION_ID_TIMESTAMP
which determines the name of the DAGAppMaster Pod.amConfig/mr3SessionId
sets the environment variableMR3_SHARED_SESSION_ID
(see Multiple HiveServer2 Instances Sharing MR3).amConfig/atsSecretKey
sets the environment variableATS_SECRET_KEY
(see Running Timeline Server).
logLevel
and hostAliases
logLevel: INFO
hostAliases:
- ip: "10.1.90.9"
hostnames:
- "gold0"
- ip: "10.1.91.4"
hostnames:
- "red0"
- ip: "10.1.91.41"
hostnames:
- "indigo20"
logLevel
specifies the logging level.hostAliases
lists host aliases.