This page shows how to use Helm and a pre-built Docker image available at DockerHub in order to operate Hive on MR3 with Minikube. All components (Metastore, HiveServer2, MR3 DAGAppMaster) will be running inside Minikube. For Metastore, we will run a MySQL database as a Pod inside Minikube. By following the instruction, the user will learn:

  1. how to start Metastore using Helm
  2. how to use Helm to run Hive on MR3 with Minikube
  3. how to create Beeline connections and send queries to HiveServer2 running inside Minikube

This scenario has the following prerequisites:

  • A running Minikube cluster should be available.
  • The user should be able to execute: 1) command kubectl; 2) command helm to use Helm.
  • A MySQL connector should be available.

This scenario should take less than 30 minutes to complete, not including the time for downloading a pre-built Docker image. This page has been tested with MR3 release 1.0 on CentOS 7.5 running Minikube v1.2.0 using user ngd.

Installation

Download an MR3 release containing the executable scripts.

$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/mr3-1.0-run.tar.gz
$ gunzip -c mr3-1.0-run.tar.gz | tar xvf -;
$ cd mr3-1.0-run

Starting a MySQL database

For simplicity, we will run a MySQL database for Metastore as a Pod inside Minikube.

$ helm install --name mysql --namespace hivemr3 stable/mysql
NAME:   mysql
LAST DEPLOYED: Sun Feb  9 23:22:00 2020
NAMESPACE: hivemr3
STATUS: DEPLOYED
...
NOTES:
MySQL can be accessed via port 3306 on the following DNS name from within your cluster:
mysql.hivemr3.svc.cluster.local
...

mysql.hivemr3.svc.cluster.local is the address (FQDN) of the MySQL database. Retrieve the root password as follows:

$ kubectl get secret --namespace hivemr3 mysql -o jsonpath="{.data.mysql-root-password}" | base64 --decode; echo
H6MMSy46Cx

Linking configuration files

We will reuse the configuration files in kubernetes/conf/ (and keys in kubernetes/key if Kerberos is used for authentication). Create symbolic links.

$ ln -s $(pwd)/kubernetes/conf/ kubernetes/helm/hive/conf
$ ln -s $(pwd)/kubernetes/key/ kubernetes/helm/hive/key

Now any change to the configuration files in kubernetes/conf/ is honored when running Hive on MR3.

Creating local directories

We need to create two new local directories:

  1. for a PersistentVolume to be shared by Pods;
  2. for a hostPath volume for ContainerWorker Pods.

Create a local directory for the PersistentVolume.

$ mkdir /home/ngd/workdir
$ chmod 777 /home/ngd/workdir 

Hive on MR3 uses local disks for writing intermediate data. In the case of running on Kubernetes, we mount hostPath volumes to mount directories of the local machine. For our example, we create a local directory for the hostPath volume for ContainerWorker Pods.

$ mkdir -p /data1/ngd/k8s
$ chmod 777 /data1/ngd/k8s

Preparing a MySQL connector

The user should have a MySQL connector compatible with the MySQL database for Metastore. One can download the official JDBC driver for MySQL at https://dev.mysql.com/downloads/connector/j/.

Copy the MySQL connector to the directory /lib under the local directory for the PersistentVolume.

$ mkdir -p /home/ngd/workdir/lib
$ cp mysql-connector-java-8.0.17.jar /home/ngd/workdir/lib/
$ chmod 777 /home/ngd/workdir/lib/mysql-connector-java-8.0.17.jar

Configuring Pods

Create kubernetes/helm/hive/values-minikube.yaml which is a collection of values to override those in kubernetes/helm/hive/values.yaml.

$ vi kubernetes/helm/hive/values-minikube.yaml

docker:
  image: mr3project/hive3:1.0
  imagePullPolicy: IfNotPresent

create:
  metastore: true
  
metastore:
  databaseHost: mysql.hivemr3.svc.cluster.local
  warehouseDir: file:///opt/mr3-run/work-dir/warehouse
  initSchema: true
  mountLib: true
  secureMode: false
  resources:
    requests:
      cpu: 1
      memory: 4Gi
    limits:
      cpu: 1
      memory: 4Gi
  heapSize: 4096

hive:
  externalIp: 12.34.56.78   # use your IP address
  authentication: NONE
  resources:
    requests:
      cpu: 1
      memory: 8Gi
    limits:
      cpu: 1
      memory: 8Gi
  heapSize: 8192

workDir:
  isNfs: false
  volumeStr: "hostPath:\n  path: /home/ngd/workdir"
  • docker.image is set to a pre-built Docker image mr3project/hive3:1.0 available at DockerHub.
  • docker.imagePullPolicy is set to IfNotPresent because we download the Docker image from DockerHub.
  • create.metastore is set to true because we will create a Metastore Pod.
  • metastore.databaseHost is set to the address (FQDN) of the MySQL database.
  • metastore.initSchema is set to true because it is the first time to run Metastore. For subsequent runs, the user should set it to false.
  • metastore.mountLib is set to true because we mount a MySQL connector inside the Metastore Pod.
  • hive.externalIp is set to the public IP address of the local machine.
  • workDir.volumeStr is set to the path to the local directory for the PersistentVolume.

Update kubernetes/conf/mr3-site.xml.

$ vi kubernetes/conf/mr3-site.xml

<property>
  <name>mr3.k8s.pod.image.pull.policy</name>
  <value>IfNotPresent</value>
</property>

<property>
  <name>mr3.k8s.pod.worker.hostpaths</name>
  <value>/data1/ngd/k8s/</value>
</property>
  • mr3.k8s.pod.image.pull.policy is set to IfNotPresent because we download the Docker image from DockerHub.
  • mr3.k8s.pod.worker.hostpaths is set to the path to the local directory for the hostPath volume.

Update kubernetes/conf/hive-site.xml.

$ vi kubernetes/conf/hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>H6MMSy46Cx</value>
</property>

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value></value>
</property>
<property>
  <name>metastore.pre.event.listeners</name>
  <value></value>
</property>

<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value> 
</property>
  • javax.jdo.option.ConnectionPassword is set to the password of the MySQL database.
  • hive.metastore.pre.event.listeners and metastore.pre.event.listeners are set to empty because we do not enable security on the Metastore side.

Update kubernetes/conf/core-site.xml.

$ vi kubernetes/conf/core-site.xml

<property>
  <name>hadoop.security.authentication</name>
  <value>simple</value>
</property>
  • hadoop.security.authentication is set to simple in order to disable Kerberos for authentication.

Starting Hive on MR3

Before running HiveServer2, the user should remove the label node-role.kubernetes.io/master from minikube node. This is because Hive on MR3 does not count the resources of master nodes when estimating the resources for ContainerWorker Pods. Since minikube node, the only node in a Minikube cluster, is a master node, we should demote it to an ordinary node in order to secure resources for ContainerWorker Pods. Thus, in order to be able to create ContainerWorker Pods in minikube node, the user should execute the following command:

$ kubectl label node minikube node-role.kubernetes.io/master-

Before running HiveServer2, the user should also make sure that no ConfigMaps exist in the namespace hivemr3. For example, the user may see ConfigMaps left over from a previous run.

$ kubectl get configmaps -n hivemr3
NAME                       DATA   AGE
mr3conf-configmap-master   1      14m
mr3conf-configmap-worker   1      14m

In such a case, manually delete these ConfigMaps.

$ kubectl delete configmap -n hivemr3 mr3conf-configmap-master mr3conf-configmap-worker

Install Helm chart for Hive on MR3 with values-minikube.yaml. We use hivemr3 for the namespace.

$ helm install --namespace hivemr3 kubernetes/helm/hive -f kubernetes/helm/hive/values-minikube.yaml
NAME:   bold-worm
LAST DEPLOYED: Sun Feb  9 23:34:17 2020
NAMESPACE: hivemr3
STATUS: DEPLOYED
...

==> v1/ConfigMap
NAME                    DATA  AGE
client-am-config        4     1s
env-configmap           1     1s
hivemr3-conf-configmap  18    1s
...

Check if all ConfigMaps are non-empty. If the DATA column for hivemr3-conf-configmap is 0, try to remove unnecessary files in the directory kubernetes/conf or kubernetes/helm/hive/conf.

The user can find four Pods running in the Minikube cluster:

  1. MySQL database; 2) Metastore; 3) HiveServer2; MR3 DAGAppMaster.
$ kubectl get pods -n hivemr3
NAME                        READY   STATUS    RESTARTS   AGE
hivemr3-hiveserver2-66b62   1/1     Running   0          39s
hivemr3-metastore-0         1/1     Running   0          39s
mr3master-8873-0            1/1     Running   0          10s
mysql-8569cdf6fc-dd6kq      1/1     Running   0          12m

Running Beeline

Download a sample dataset and copy it to the directory for the PersistentVolume.

$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/pokemon.csv
$ cp pokemon.csv /home/ngd/workdir
$ chmod 777 /home/ngd/workdir/pokemon.csv

The user can verify that the sample dataset is accessible inside the HiveServer2 Pod.

$ kubectl exec -n hivemr3 -it hivemr3-hiveserver2-66b62 -- /bin/bash 
bash-4.2$ ls /opt/mr3-run/work-dir/       
bce2a0bf-0143-4bb6-aebb-a3c067d6fbd3_resources	hive  pokemon.csv
c66dcd2d-e51f-4972-86b8-7281e92fadca_resources	lib
bash-4.2$ exit

The user may use any client program (such as beeline) to connect to HiveServer2 which is running at port 9852.

$ beeline -u "jdbc:hive2://12.34.56.78:9852/;;;" -n hive -p hive 

Alternatively the user can run Beeline inside the Hiveserver2 Pod.

$ kubectl exec -n hivemr3 -it hivemr3-hiveserver2-kghbv -- /bin/bash 
bash-4.2$ export USER=hive
bash-4.2$ /opt/mr3-run/hive/run-beeline.sh

Execute queries.

0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> show databases;
...
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (0.119 seconds)
0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> use default;
...
No rows affected (0.031 seconds)
0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> CREATE TABLE pokemon (Number Int,Name String,Type1 String,Type2 String,Total Int,HP Int,Attack Int,Defense Int,Sp_Atk Int,Sp_Def Int,Speed Int) row format delimited fields terminated BY ',' lines terminated BY '\n' tblproperties("skip.header.line.count"="1");
...
No rows affected (0.598 seconds)
0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> load data local inpath '/opt/mr3-run/work-dir/pokemon.csv' INTO table pokemon;
...
No rows affected (0.575 seconds)
0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> select avg(HP) from pokemon;
...
+---------------------+
|         _c0         |
+---------------------+
| 144.84882280049567  |
+---------------------+
1 row selected (11.747 seconds)
0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> create table pokemon1 as select *, IF(HP>160.0,'strong',IF(HP>140.0,'moderate','weak')) AS power_rate from pokemon;
...
No rows affected (2.044 seconds)
0: jdbc:hive2://hivemr3-hiveserver2-66b62:983> select COUNT(name), power_rate from pokemon1 group by power_rate;
...
+------+-------------+
| _c0  | power_rate  |
+------+-------------+
| 363  | strong      |
| 336  | weak        |
| 108  | moderate    |
+------+-------------+
3 rows selected (2.19 seconds)

The user can see that ContainerWorker Pods have been created.

$ kubectl get pods -n hivemr3                                        
NAME                        READY   STATUS    RESTARTS   AGE
hivemr3-hiveserver2-66b62   1/1     Running   0          15m
hivemr3-metastore-0         1/1     Running   0          15m
mr3master-8873-0            1/1     Running   0          14m
mr3worker-af49-1            1/1     Running   0          2m14s
mr3worker-af49-2            1/1     Running   0          77s
mr3worker-af49-3            1/1     Running   0          77s
mr3worker-af49-4            0/1     Pending   0          77s
mysql-8569cdf6fc-dd6kq      1/1     Running   0          27m

The user can find the warehouse directory /home/ngd/workdir/warehouse/.

$ ls /home/ngd/workdir/warehouse
pokemon  pokemon1

Terminating Hive on MR3

In order to Hive on MR3, the user should first delete the DAGAppMaster Pod and then delete Helm chart, not the other way. This is because deleting Helm chart revokes the ServiceAccount resource which DAGAppMaster uses to delete ContainerWorker Pods. Hence, if the user deletes Helm chart first, all remaining Pods should be deleted manually.

Delete the DAGAppMaster Pod which in turn deletes all ContainerWorker Pods automatically.

$ kubectl -n hivemr3 delete pod mr3master-8873-0

Delete Helm chart.

$ helm delete banking-macaw         
release "banking-macaw" deleted

After a while, the user can see that only the MySQL database Pod remains.

$ kubectl get pods -n hivemr3
NAME                        READY   STATUS        RESTARTS   AGE
mysql-8569cdf6fc-dd6kq      1/1     Running       0          28m

Then stop the MySQL database.

$ helm delete --purge mysql

As the last step, the user may find that ConfigMaps mr3conf-configmap-master and mr3conf-configmap-worker are still alive. These ConfigMaps are created not by Helm but by HiveServer2 and DAGAppMaster, and are thus not deleted by the command helm delete.

$ kubectl get configmaps -n hivemr3
NAME                       DATA   AGE
mr3conf-configmap-master   1      36m
mr3conf-configmap-worker   1      36m

In such a case, manually delete these ConfigMaps.

$ kubectl delete configmap -n hivemr3 mr3conf-configmap-master mr3conf-configmap-worker