In order to install Hive on MR3 on Kubernetes, download an MR3 release and build all necessary components from the source code. Alternatively download a pre-built MR3 release and uncompress it in a directory of your choice (e.g., under the user’s home directory). A pre-built MR3 release contains everything for running Hive on MR3 on Kubernetes, including scripts, preset configuration files, and jar files. (Hive 3 with MR3 master and Hive 4 are built with access to Amazon S3.)

$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive2.3.6.tar.gz
$ gunzip -c hivemr3-1.1-hive2.3.6.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive2.3.6 mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive3.1.2.tar.gz
$ gunzip -c hivemr3-1.1-hive3.1.2.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive3.1.2 mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive3.1.2-k8s.tar.gz
$ gunzip -c hivemr3-1.1-hive3.1.2-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive3.1.2-k8s mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive4.0.0-k8s.tar.gz
$ gunzip -c hivemr3-1.1-hive4.0.0-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive4.0.0-k8s mr3-run

Then update the following environment variables in env.sh:

HADOOP_HOME_LOCAL=$HADOOP_HOME
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
  • HADOOP_HOME_LOCAL should point to an installation directory of Hadoop. Note that the user needs only a Hadoop installation and does not have to run a working Hadoop cluster. For example, it is okay to use the binary distribution of Hadoop downloaded from Apache Hadoop webpage without further configuration. The Hadoop installation should match the base version used in the MR3 release. For example, with MR3 release master, the use should install Hadoop 3.1.

    $ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
    $ gunzip -c hadoop-2.7.7.tar.gz | tar xvf -
    
    $ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
    $ gunzip -c hadoop-3.1.2.tar.gz | tar xvf -
    

  • HIVE_MYSQL_DRIVER should point to a MySQL connector jar file which is necessary when connecting to a MySQL database. The MySQL connector jar file should be compatible with the MySQL databases for Metastore and Ranger. If HIVE_MYSQL_DRIVER is set to empty, the Docker image (to be built later) does not include a MySQL connector and the user should mount a MySQL connector manually using PersistentVolume.

The following structure shows all files and directories relevant to Hive on Kubernetes:

# scripts for populating and cleaning directories for Docker images
├── build-k8s.sh
├── build-k8s-ats.sh
├── build-k8s-ranger.sh
├── clean-k8s.sh
├── clean-k8s-ats.sh
└── clean-k8s-ranger.sh

# scripts for building Docker images
└── kubernetes
    ├── build-hive.sh
    ├── build-ats.sh
    └── build-ranger.sh

# scripts for running HiveServer2, Timeline Server, Ranger
└── kubernetes
    ├── config-run.sh
    ├── generate-hivemr3-ssl.sh
    ├── run-hive.sh
    ├── run-metastore.sh
    ├── run-ats.sh
    └── run-ranger.sh

# resources for running Metastore and HiveServer2
└── kubernetes
    ├── env.sh
    ├── conf
       ├── core-site.xml
       ├── hive-log4j2.properties
       ├── hive-log4j.properties
       ├── hive-site.xml
       ├── jgss.conf
       ├── krb5.conf
       ├── mapred-site.xml
       ├── mr3-site.xml
       ├── ranger-hive-audit.xml
       ├── ranger-hive-security.xml
       ├── ranger-policymgr-ssl.xml
       ├── tez-site.xml
       └── yarn-site.xml
    ├── key
    └── hive
        ├── common-setup.sh
        ├── Dockerfile
        ├── Dockerfile.release
        ├── hadoop
           └── hadoop-setup.sh
        ├── hive
           ├── hiveserver2-service.sh
           ├── hive-setup.sh
           ├── master-control.sh
           ├── metastore-service.sh
           ├── run-beeline.sh
           ├── run-hive-cli.sh
           ├── run-hplsql.sh
           ├── run-master.sh
           └── run-worker.sh
        ├── mr3
           └── mr3-setup.sh
        └── tez
            └── tez-setup.sh

# resources for running Timeline Server
└── kubernetes
    ├── ats-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── log4j.properties
       ├── ssl-server.xml
       └── yarn-site.xml
    ├── ats-key
    └── ats
        ├── Dockerfile
        └── timeline-service.sh

# resources for running Ranger
└── kubernetes
    ├── ranger-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── ranger-admin-site.xml
       ├── ranger-log4j.properties
       ├── solr-core.properties
       ├── solr-elevate.xml
       ├── solr.in.sh
       ├── solr-log4j2.xml
       ├── solr-managed-schema
       ├── solr-security.json
       ├── solr-solrconfig.xml
       └── solr-solr.xml
    ├── ranger-key
       └── install.properties
    └── ranger
        ├── Dockerfile
        ├── start-ranger.sh
        └── start-solr.sh

# YAML files
└── kubernetes
    └── yaml
        ├── ats-service.yaml
        ├── ats.yaml
        ├── cluster-role.yaml
        ├── hive-role.yaml
        ├── hiveserver2-service.yaml
        ├── hive-service-account.yaml
        ├── hive.yaml
        ├── metastore-role.yaml
        ├── metastore-service-account.yaml
        ├── metastore-service.yaml
        ├── metastore.yaml
        ├── namespace.yaml
        ├── ranger-service.yaml
        ├── ranger.yaml
        ├── workdir-pvc-ats.yaml
        ├── workdir-pv-ats.yaml
        ├── workdir-pvc-ranger.yaml
        ├── workdir-pv-ranger.yaml
        ├── workdir-pvc.yaml
        └── workdir-pv.yaml

Prerequisites for running Hive on Kubernetes

In order to run Hive on Kubernetes, the following requirements should be met.

  • A running Kubernetes cluster should be available. In particular, the user should be able to execute: 1) command docker so as to build Docker images; 2) command kubectl so as to start Pods.
  • A PersistentVolume should be available, e.g., in order to be able to copy the result of running a query from ContainerWorkers to HiveServer2.
  • The user should have access to Metastore. If Metastore runs in a secure mode, its service keytab file should be copied to the directory kubernetes/key. The user can also run Metastore as a Pod if the MySQL database for Metastore is accessible.
  • If HiveServer2 uses Kerberos-based authentication, its service keytab file should be copied to the directory kubernetes/key.
  • In order to renew HDFS and Hive tokens in DAGAppMaster (for mr3.keytab in mr3-site.xml) and ContainerWorkers (for mr3.k8s.keytab.mount.file in mr3-site.xml), a keytab file should be copied to the directory kubernetes/key. The keytab file is unnecessary if HDFS is not used.

In general, we need two service keytab files and an ordinary keytab file to be specified by three environment variables in kubernetes/env.sh:

kerberos.keytab.file

In practice, it is okay to use a common service keytab file for both Metastore and HiveServer2. Furthermore it is also okay to use the same service keytab file for renewing HDFS and Hive tokens. Thus the user can use a single service keytab file for running Hive on Kubernetes. (The default configuration files in an MR3 release use a single service keytab hive.service.keytab for all three purposes.)

Building a Docker Image

The user can build a Docker image for running Hive on MR3 on Kubernetes. The first step is to collect all necessary files in the directory kubernetes/hive by executing build-k8s.sh:

--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.6).
--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.2) (default).
--hivesrc4                # Choose hive4-mr3 (based on Hive 4.0.0-SNAPSHOT).

Note that before executing build-k8s.sh, HADOOP_HOME_LOCAL in env.sh should point to the installation directory of Hadoop. build-k8s.sh copies some files from the Hadoop installation as well as jar files from MR3, Tez, and Hive. Here is an example:

$ ./clean-k8.sh
$ ./build-k8s.sh --hivesrc3
$ ls kubernetes/hive/hadoop/apache-hadoop/
bin  etc  lib  libexec  share
$ ls kubernetes/hive/hive/apache-hive/
bin  conf  hcatalog  lib

Next the user should set two environment variables in kubernetes/env.sh (not env.sh in the installation directory):

DOCKER_HIVE_IMG=10.1.91.17:5000/hive3:latest
DOCKER_USER=hive
  • DOCKER_HIVE_IMG is the full name of the Docker image including a tag. It specifies the name of the Docker image for running HiveServer2 which may include the address of a running Docker server.
  • DOCKER_USER should match the user specified in kubernetes/hive/Dockerfile (which is hive by default).

By default, ContainerWorker Pods use the same Docker image specified by the environment variable DOCKER_HIVE_IMG. Alternatively the user can choose to create a separate Docker image which uses kubernetes/hive/Dockerfile-worker and is smaller than the main Docker image. In order to create a Docker image for ContainerWorker Pods, either set an environment variable DOCKER_HIVE_WORKER_IMG to an approprivate name (e.g., 10.1.91.17:5000/hive3worker:latest) or directly update kubernetes/env.sh.

DOCKER_HIVE_WORKER_IMG=${DOCKER_HIVE_WORKER_IMG:-$DOCKER_HIVE_IMG}

The last step is to build a Docker image from Dockerfile in the directory kubernetes/hive by executing kubernetes/build-hive.sh. The script builds a Docker image (which contains everything for running HiveServer2, DAGAppMaster, and ContainerWorker) and registers it to the Docker server specified in kubernetes/env.sh. If successful, the user can pull the Docker image on another node:

$ kubernetes/build-hive.sh
$ docker pull 10.1.91.17:5000/hive3
Using default tag: latest
Trying to pull repository 10.1.91.17:5000/hive3 ... 
latest: Pulling from 10.1.91.17:5000/hive3
581e78aaf612: Already exists 
...
14c45e1e7f1d: Pull complete 
Digest: sha256:ce1959eea28f53cb4d075466b5c5b613801b2b8d033f7469949505766f9af621
Status: Downloaded newer image for 10.1.91.17:5000/hive3:latest

Dockerfile included in an MR3 release is based on the Docker image centos:7.6.1810. The user may use his or her own Dockerfile as far as the final Docker image contains: 1) Java 8; 2) glibc.

  • The popular alpine Linux does not include glibc.