Using HDFS instead of PersistentVolumes |

In the previous approach, we create a PersistentVolume for storing transient data such as results of running queries. With access to HDFS, the user can dispense with PersistentVolumes for Metastore and HiveServer2. In order to use HDFS, the user should skip or adjust those steps in the previous approach that deal with PersistentVolume workdir-pv and PersistentVolumeClaim workdir-pvc.

`kubernetes/conf/hive-site.xml`

Set the configuration key hive.exec.scratchdir in hive-site.xml to the full path (including the address and port of NameNode) of the scratch directory of HiveServer2. The scratch directory must be writable to the user running HiveServer2 and have directory permission 733. If it does not exist, HiveServer2 automatically creates a new directory with permission 733. Do not update the configuration key hive.downloaded.resources.dir because it should point to a directory on the local file system.

$ vi kubernetes/conf/hive-site.xml

<property>
  <name>hive.exec.scratchdir</name>
  <value>hdfs://blue0:8020/tmp/hivemr3/workdir</value>
</property>

If the query results cache is enabled with the configuration key hive.query.results.cache.enabled set to true, its directory specified by the configuration key hive.query.results.cache.directory should reside on the same file system that contains the scratch directory, namely HDFS. For example, the user can use a directory under /tmp on HDFS for the query results caches.

$ vi kubernetes/conf/hive-site.xml

<property>
  <name>hive.query.results.cache.directory</name>
  <value>hdfs://blue0:8020/tmp/_resultscache_</value>
</property>

If the file system does not match, the query results cache is never used and HiveServer throws IllegalArgumentException:

java.lang.IllegalArgumentException: Wrong FS: hdfs://blue0:8020/tmp/hivemr3/workdir/root/fdc9e095-e47e-4146-97c7-03314933f8bf/hive_2020-10-31_10-11-19_665_8775207411585383706-1/-mr-10001/.hive-staging_hive_2020-10-31_10-11-19_665_8775207411585383706-1/-ext-10002, expected: file:///

Removing PersistentVolume `workdir-pv` and PersistentVolumeClaim `workdir-pvc`

Open kubernetes/env.sh and set the following two environment variables to empty values.

$ vi kubernetes/env.sh

WORK_DIR_PERSISTENT_VOLUME_CLAIM=
WORK_DIR_PERSISTENT_VOLUME_CLAIM_MOUNT_DIR=

Set METASTORE_USE_PERSISTENT_VOLUME to false in env.sh.

$ vi kubernetes/env.sh

METASTORE_USE_PERSISTENT_VOLUME=false

Open kubernetes/yaml/metastore.yaml and comment out the following lines:

$ vi kubernetes/yaml/metastore.yaml

# - name: work-dir-volume
#   mountPath: /opt/mr3-run/work-dir/
# - name: work-dir-volume
#   persistentVolumeClaim:
#     claimName: workdir-pvc

Open kubernetes/yaml/hive.yaml and comment out the following lines:

$ vi kubernetes/yaml/hive.yaml

# - name: work-dir-volume
#   mountPath: /opt/mr3-run/work-dir
# - name: work-dir-volume
#   persistentVolumeClaim:
#     claimName: workdir-pvc

Now the user can run Hive on MR3 on Kubernetes without using PersistentVolumes. If, however, the Docker image does not contain a MySQL connector jar file and Metastore/Ranger do not automatically download such a jar file, the user should use a hostPath volume to mount such a jar file in the directory /opt/mr3-run/host-lib inside the Metastore Pod. See Downloading a MySQL connector in Creating an EKS cluster for an example.

Timeline Server and Apache Ranger

For Timeline Server and Apache Ranger, the user can skip or adjust those steps that deal with PersistentVolumes workdir-pv-ats and workdir-pv-ranger, and PersistentVolumeClaims workdir-pvc-ats and workdir-pvc-ranger.

Open kubernetes/yaml/ats.yaml and comment out the following lines:

$ vi kubernetes/yaml/ats.yaml

# - name: work-dir-volume
#   mountPath: /opt/mr3-run/ats/work-dir/
# - name: work-dir-volume
#   persistentVolumeClaim:
#     claimName: workdir-pvc-ats

Open kubernetes/yaml/ranger.yaml and comment out the following lines:

$ vi kubernetes/yaml/ranger.yaml

# - name: work-dir-volume
#   mountPath: /opt/mr3-run/ranger/work-dir/
# - name: work-dir-volume
#   persistentVolumeClaim:
#     claimName: workdir-pvc-ranger

Then Timeline Server and Apache Ranger use local directories inside Docker containers. Alternatively the use can create emptyDir volumes and mount with work-dir-volume.

kubernetes/conf/hive-site.xml

Removing PersistentVolume workdir-pv and PersistentVolumeClaim workdir-pvc

Timeline Server and Apache Ranger

`kubernetes/conf/hive-site.xml`

Removing PersistentVolume `workdir-pv` and PersistentVolumeClaim `workdir-pvc`