There are three ways to install Hive on MR3 on Kubernetes.

  1. Use a pre-built Docker image from DockerHub and an MR3 release containing the executable scripts from GitHub.
  2. Download an MR3 release and build all necessary components from the source code, and build a Docker image.
  3. Download a pre-built MR3 release and build a Docker image.

For the first option, we refer the user to the quick start guide (e.g., On Minikube with a Pre-built Docker Image). The second option is for the user who wants to customize Hive for MR3 (e.g., by applying patches from Apache Hive). Below we explain the third option.

Installing Hive on MR3

Download a pre-built MR3 release and uncompress it in a directory of your choice (e.g., under the user’s home directory). A pre-built MR3 release contains everything for running Hive on MR3 on Kubernetes, including scripts, preset configuration files, and jar files. (Hive 3 and Hive 4 with MR3 master are built with access to Amazon S3.)

$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive2.3.6.tar.gz
$ gunzip -c hivemr3-1.1-hive2.3.6.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive2.3.6 mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive3.1.2.tar.gz
$ gunzip -c hivemr3-1.1-hive3.1.2.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive3.1.2 mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive3.1.2-k8s.tar.gz
$ gunzip -c hivemr3-1.1-hive3.1.2-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive3.1.2-k8s mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.1/hivemr3-1.1-hive4.0.0-k8s.tar.gz
$ gunzip -c hivemr3-1.1-hive4.0.0-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.1-hive4.0.0-k8s mr3-run

Then update the following environment variables in env.sh:

HADOOP_HOME_LOCAL=$HADOOP_HOME
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
  • HADOOP_HOME_LOCAL should point to an installation directory of Hadoop. Note that the user needs only a Hadoop installation and does not have to run a working Hadoop cluster. For example, it is okay to use the binary distribution of Hadoop downloaded from Apache Hadoop webpage without further configuration. The Hadoop installation should match the base version used in the MR3 release. For example, with MR3 release master, the use should install Hadoop 3.1.

    $ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
    $ gunzip -c hadoop-2.7.7.tar.gz | tar xvf -
    
    $ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
    $ gunzip -c hadoop-3.1.2.tar.gz | tar xvf -
    

  • HIVE_MYSQL_DRIVER should point to a MySQL connector jar file which is necessary when connecting to a MySQL database. The MySQL connector jar file should be compatible with the MySQL databases for Metastore and Ranger. If HIVE_MYSQL_DRIVER is set to empty, the Docker image (to be built later) does not include a MySQL connector and the user should mount a MySQL connector manually using a hostPath volume or PersistentVolume.

The following structure shows all files and directories relevant to Hive on Kubernetes:

# scripts for populating and cleaning directories for Docker images
├── build-k8s.sh
├── build-k8s-ats.sh
├── build-k8s-ranger.sh
├── clean-k8s.sh
├── clean-k8s-ats.sh
└── clean-k8s-ranger.sh

# scripts for building Docker images
└── kubernetes
    ├── build-hive.sh
    ├── build-ats.sh
    └── build-ranger.sh

# scripts for running HiveServer2, Timeline Server, Ranger
└── kubernetes
    ├── config-run.sh
    ├── generate-hivemr3-ssl.sh
    ├── run-hive.sh
    ├── run-metastore.sh
    ├── run-ats.sh
    └── run-ranger.sh

# resources for running Metastore and HiveServer2
└── kubernetes
    ├── env.sh
    ├── conf
       ├── core-site.xml
       ├── hive-log4j2.properties
       ├── hive-log4j.properties
       ├── hive-site.xml
       ├── jgss.conf
       ├── krb5.conf
       ├── mapred-site.xml
       ├── mr3-site.xml
       ├── ranger-hive-audit.xml
       ├── ranger-hive-security.xml
       ├── ranger-policymgr-ssl.xml
       ├── tez-site.xml
       └── yarn-site.xml
    ├── key
    └── hive
        ├── common-setup.sh
        ├── Dockerfile
        ├── Dockerfile.release
        ├── hadoop
           └── hadoop-setup.sh
        ├── hive
           ├── hiveserver2-service.sh
           ├── hive-setup.sh
           ├── master-control.sh
           ├── metastore-service.sh
           ├── run-beeline.sh
           ├── run-hive-cli.sh
           ├── run-hplsql.sh
           ├── run-master.sh
           └── run-worker.sh
        ├── mr3
           └── mr3-setup.sh
        └── tez
            └── tez-setup.sh

# resources for running Timeline Server
└── kubernetes
    ├── ats-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── log4j.properties
       ├── ssl-server.xml
       └── yarn-site.xml
    ├── ats-key
    └── ats
        ├── Dockerfile
        └── timeline-service.sh

# resources for running Ranger
└── kubernetes
    ├── ranger-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── ranger-admin-site.xml
       ├── ranger-log4j.properties
       ├── solr-core.properties
       ├── solr-elevate.xml
       ├── solr.in.sh
       ├── solr-log4j2.xml
       ├── solr-managed-schema
       ├── solr-security.json
       ├── solr-solrconfig.xml
       └── solr-solr.xml
    ├── ranger-key
       └── install.properties
    └── ranger
        ├── Dockerfile
        ├── start-ranger.sh
        └── start-solr.sh

# YAML files
└── kubernetes
    └── yaml
        ├── ats-service.yaml
        ├── ats.yaml
        ├── cluster-role.yaml
        ├── hive-role.yaml
        ├── hiveserver2-service.yaml
        ├── hive-service-account.yaml
        ├── hive.yaml
        ├── master-role.yaml
        ├── master-service-account.yaml
        ├── metastore-role.yaml
        ├── metastore-service.yaml
        ├── metastore.yaml
        ├── namespace.yaml
        ├── prometheus-service.yaml
        ├── ranger-service.yaml
        ├── ranger.yaml
        ├── workdir-pvc-ats.yaml
        ├── workdir-pv-ats.yaml
        ├── workdir-pvc-ranger.yaml
        ├── workdir-pv-ranger.yaml
        ├── workdir-pvc.yaml
        ├── workdir-pv.yaml
        ├── worker-role.yaml
        └── worker-service-account.yaml

Building a Docker Image

The user can build a Docker image for running Hive on MR3 on Kubernetes. (We assume that the user can execute the command docker so as to build a Docker image.) The first step is to collect all necessary files in the directory kubernetes/hive by executing build-k8s.sh:

--hivesrc2                # Choose hive2-mr3 (based on Hive 2.3.6).
--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.2) (default).
--hivesrc4                # Choose hive4-mr3 (based on Hive 4.0.0-SNAPSHOT).

Note that before executing build-k8s.sh, HADOOP_HOME_LOCAL in env.sh should point to the installation directory of Hadoop. build-k8s.sh copies some files from the Hadoop installation as well as jar files from MR3, Tez, and Hive. Here is an example:

$ ./clean-k8.sh
$ ./build-k8s.sh --hivesrc3
$ ls kubernetes/hive/hadoop/apache-hadoop/
bin  etc  lib  libexec  share
$ ls kubernetes/hive/hive/apache-hive/
bin  conf  hcatalog  lib

Next the user should set two environment variables in kubernetes/env.sh (not env.sh in the installation directory):

DOCKER_HIVE_IMG=10.1.91.17:5000/hive3:latest
DOCKER_USER=root
  • DOCKER_HIVE_IMG is the full name of the Docker image including a tag. It specifies the name of the Docker image for running HiveServer2 which may include the address of a running Docker server.
  • DOCKER_USER should match the user specified in kubernetes/hive/Dockerfile (which is root by default).

By default, ContainerWorker Pods use the same Docker image specified by the environment variable DOCKER_HIVE_IMG. Alternatively the user can choose to create a separate Docker image which uses kubernetes/hive/Dockerfile-worker and is smaller than the main Docker image. In order to create a Docker image for ContainerWorker Pods, either set an environment variable DOCKER_HIVE_WORKER_IMG to an approprivate name (e.g., 10.1.91.17:5000/hive3worker:latest) or directly update kubernetes/env.sh.

DOCKER_HIVE_WORKER_IMG=${DOCKER_HIVE_WORKER_IMG:-$DOCKER_HIVE_IMG}

The last step is to build a Docker image from Dockerfile in the directory kubernetes/hive by executing kubernetes/build-hive.sh. The script builds a Docker image (which contains everything for running HiveServer2, DAGAppMaster, and ContainerWorker) and registers it to the Docker server specified in kubernetes/env.sh. If successful, the user can pull the Docker image on another node:

$ kubernetes/build-hive.sh
$ docker pull 10.1.91.17:5000/hive3
Using default tag: latest
Trying to pull repository 10.1.91.17:5000/hive3 ... 
latest: Pulling from 10.1.91.17:5000/hive3
581e78aaf612: Already exists 
...
14c45e1e7f1d: Pull complete 
Digest: sha256:ce1959eea28f53cb4d075466b5c5b613801b2b8d033f7469949505766f9af621
Status: Downloaded newer image for 10.1.91.17:5000/hive3:latest

Dockerfile included in an MR3 release is based on the Docker image openjdk:8-jre-slim. The user may use his or her own Dockerfile as far as the final Docker image contains: 1) Java 8; 2) glibc.

  • The popular alpine Linux does not include glibc.

After building a Docker image, proceed to Hive on Kubernetes or User Guide.