In the previous approach, we create a PersistentVolume for storing transient data such as results of running queries.
With access to HDFS, the user can dispense with PersistentVolumes for Metastore and HiveServer2.
In order to use HDFS,
the user should skip or adjust those steps in the previous approach that deal with PersistentVolume workdir-pv
and PersistentVolumeClaim workdir-pvc
.
kubernetes/conf/hive-site.xml
Set the configuration key hive.exec.scratchdir
in hive-site.xml
to the full path (including the address and port of NameNode) of the scratch directory of HiveServer2.
The scratch directory must be writable to the user running HiveServer2 and have directory permission 733.
If it does not exist, HiveServer2 automatically creates a new directory with permission 733.
Do not update the configuration key hive.downloaded.resources.dir
because it should point to a directory on the local file system.
$ vi kubernetes/conf/hive-site.xml
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://blue0:8020/tmp/hivemr3/workdir</value>
</property>
If the query results cache is enabled with the configuration key hive.query.results.cache.enabled
set to true,
its directory specified by the configuration key hive.query.results.cache.directory
should reside on the same file system that contains the scratch directory, namely HDFS.
For example, the user can use a directory under /tmp
on HDFS for the query results caches.
$ vi kubernetes/conf/hive-site.xml
<property>
<name>hive.query.results.cache.directory</name>
<value>hdfs://blue0:8020/tmp/_resultscache_</value>
</property>
If the file system does not match,
the query results cache is never used and HiveServer throws IllegalArgumentException
:
java.lang.IllegalArgumentException: Wrong FS: hdfs://blue0:8020/tmp/hivemr3/workdir/root/fdc9e095-e47e-4146-97c7-03314933f8bf/hive_2020-10-31_10-11-19_665_8775207411585383706-1/-mr-10001/.hive-staging_hive_2020-10-31_10-11-19_665_8775207411585383706-1/-ext-10002, expected: file:///
Removing PersistentVolume workdir-pv
and PersistentVolumeClaim workdir-pvc
Open kubernetes/env.sh
and set the following two environment variables to empty values.
$ vi kubernetes/env.sh
WORK_DIR_PERSISTENT_VOLUME_CLAIM=
WORK_DIR_PERSISTENT_VOLUME_CLAIM_MOUNT_DIR=
Set METASTORE_USE_PERSISTENT_VOLUME
to false in env.sh
.
$ vi kubernetes/env.sh
METASTORE_USE_PERSISTENT_VOLUME=false
Open kubernetes/yaml/metastore.yaml
and comment out the following lines:
$ vi kubernetes/yaml/metastore.yaml
# - name: work-dir-volume
# mountPath: /opt/mr3-run/work-dir/
# - name: work-dir-volume
# persistentVolumeClaim:
# claimName: workdir-pvc
Open kubernetes/yaml/hive.yaml
and comment out the following lines:
$ vi kubernetes/yaml/hive.yaml
# - name: work-dir-volume
# mountPath: /opt/mr3-run/work-dir
# - name: work-dir-volume
# persistentVolumeClaim:
# claimName: workdir-pvc
Now the user can run Hive on MR3 on Kubernetes without using PersistentVolumes.
If, however, the Docker image does not contain a MySQL connector jar file
and Metastore/Ranger do not automatically download such a jar file,
the user should use a hostPath volume to mount such a jar file in the directory /opt/mr3-run/host-lib
inside the Metastore Pod.
See Downloading a MySQL connector in Creating an EKS cluster for an example.
Timeline Server and Apache Ranger
For Timeline Server and Apache Ranger, the user can skip or adjust those steps that deal with
PersistentVolumes workdir-pv-ats
and workdir-pv-ranger
,
and PersistentVolumeClaims workdir-pvc-ats
and workdir-pvc-ranger
.
Open kubernetes/yaml/ats.yaml
and comment out the following lines:
$ vi kubernetes/yaml/ats.yaml
# - name: work-dir-volume
# mountPath: /opt/mr3-run/ats/work-dir/
# - name: work-dir-volume
# persistentVolumeClaim:
# claimName: workdir-pvc-ats
Open kubernetes/yaml/ranger.yaml
and comment out the following lines:
$ vi kubernetes/yaml/ranger.yaml
# - name: work-dir-volume
# mountPath: /opt/mr3-run/ranger/work-dir/
# - name: work-dir-volume
# persistentVolumeClaim:
# claimName: workdir-pvc-ranger
Then Timeline Server and Apache Ranger use local directories inside Docker containers.
Alternatively the use can create emptyDir volumes and mount with work-dir-volume
.