Using a pre-built Docker image

To use a pre-built Docker image from DockerHub, it suffices to use the MR3 release containing all the executable scripts from the GitHub repository (https://github.com/mr3project/mr3-run-k8s). Clone the repository.

$ git clone https://github.com/mr3project/mr3-run-k8s.git
We recommend the quick start guide On Kubernetes which demonstrates how to use a pre-built Docker Image.

Building a new Docker image

If the user wants to manually build a new Docker image, there are two approaches.

  1. Download MR3 and build all necessary components from the source code, and build a Docker image. This option is for those users who want to customize Hive for MR3 (e.g., by applying patches from Apache Hive).
  2. Install a pre-built MR3 release and build a Docker image.

Below we explain the second approach.

Installing a pre-built MR3 release

Download a pre-built MR3 release and uncompress it in a directory of your choice (e.g., under the user’s home directory). A pre-built MR3 release contains everything for running Hive on MR3 on Kubernetes, including scripts, preset configuration files, and jar files. (Hive 3 is built with access to Amazon S3.)

$ wget https://github.com/mr3project/mr3-release/releases/download/v1.5/hivemr3-1.5-hive3.1.3-k8s.tar.gz
$ gunzip -c hivemr3-1.5-hive3.1.3-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.5-hive3.1.3-k8s mr3-run
$ cd mr3-run

Then update the following environment variables in env.sh:

$ vi env.sh

HADOOP_HOME_LOCAL=$HADOOP_HOME
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
  • HADOOP_HOME_LOCAL should point to an installation directory of Hadoop. Note that the user needs only a Hadoop installation and does not have to run a working Hadoop cluster. For example, it is okay to use the binary distribution of Hadoop downloaded from Apache Hadoop webpage without further configuration. The Hadoop installation should match the base version used in the MR3 release. For Hive 3, the use should install Hadoop 3.1.

    $ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
    $ gunzip -c hadoop-3.1.2.tar.gz | tar xvf -
    

  • HIVE_MYSQL_DRIVER should point to a database connector jar file which should be compatible with the database for Metastore (MySQL, Postgres, MS SQL, or Oracle). For MySQL, Postgres, and MS SQL, HIVE_MYSQL_DRIVER may be set to empty because Metastore in Hive on MR3 either already includes or automatically downloads a compatible database connector. If HIVE_MYSQL_DRIVER is set to empty when using Oracle for Metastore, the Docker image (to be built later) does not include a database connector, and the user should mount a database connector manually using a hostPath volume or PersistentVolume.

The following structure shows all files and directories relevant to Hive on Kubernetes:

# scripts for populating and cleaning directories for Docker images
├── build-k8s.sh
├── build-k8s-ats.sh
├── build-k8s-ranger.sh
├── clean-k8s.sh
├── clean-k8s-ats.sh
└── clean-k8s-ranger.sh

# scripts for building Docker images
└── kubernetes
    ├── build-hive.sh
    ├── build-ats.sh
    └── build-ranger.sh

# scripts for running HiveServer2, Timeline Server, Ranger
└── kubernetes
    ├── config-run.sh
    ├── generate-hivemr3-ssl.sh
    ├── run-hive.sh
    ├── run-metastore.sh
    ├── run-ats.sh
    └── run-ranger.sh

# resources for running Metastore and HiveServer2
└── kubernetes
    ├── env.sh
    ├── conf
       ├── core-site.xml
       ├── hive-log4j2.properties
       ├── hive-log4j.properties
       ├── hive-site.xml
       ├── jgss.conf
       ├── krb5.conf
       ├── mapred-site.xml
       ├── mr3-site.xml
       ├── ranger-hive-audit.xml
       ├── ranger-hive-security.xml
       ├── ranger-policymgr-ssl.xml
       ├── tez-site.xml
       └── yarn-site.xml
    ├── key
    └── hive
        ├── common-setup.sh
        ├── Dockerfile
        ├── Dockerfile-worker
        ├── hadoop
           └── hadoop-setup.sh
        ├── hive
           ├── hiveserver2-service.sh
           ├── hive-setup.sh
           ├── master-control.sh
           ├── metastore-service.sh
           ├── run-beeline.sh
           ├── run-hive-cli.sh
           ├── run-hplsql.sh
           ├── run-master.sh
           └── run-worker.sh
        ├── mr3
           └── mr3-setup.sh
        └── tez
            └── tez-setup.sh

# resources for running Timeline Server
└── kubernetes
    ├── ats-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── log4j.properties
       ├── ssl-server.xml
       └── yarn-site.xml
    ├── ats-key
    └── ats
        ├── Dockerfile
        └── timeline-service.sh

# resources for running Ranger
└── kubernetes
    ├── ranger-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── ranger-log4j.properties
       ├── solr-core.properties
       ├── solr-elevate.xml
       ├── solr-log4j2.xml
       ├── solr-managed-schema
       ├── solr-security.json
       ├── solr-solrconfig.xml
       └── solr-solr.xml
    ├── ranger-key
       ├── install.properties
       └── solr.in.sh
    └── ranger
        ├── Dockerfile
        ├── start-ranger.sh
        └── start-solr.sh

# YAML files
└── kubernetes
    └── yaml
        ├── ats-service.yaml
        ├── ats.yaml
        ├── cluster-role.yaml
        ├── hive-role.yaml
        ├── hiveserver2-service.yaml
        ├── hive-service-account.yaml
        ├── hive.yaml
        ├── master-role.yaml
        ├── master-service-account.yaml
        ├── metastore-role.yaml
        ├── metastore-service.yaml
        ├── metastore.yaml
        ├── namespace.yaml
        ├── prometheus-service.yaml
        ├── ranger-service.yaml
        ├── ranger.yaml
        ├── workdir-pvc-ats.yaml
        ├── workdir-pv-ats.yaml
        ├── workdir-pvc-ranger.yaml
        ├── workdir-pv-ranger.yaml
        ├── workdir-pvc.yaml
        ├── workdir-pv.yaml
        ├── worker-role.yaml
        └── worker-service-account.yaml

Building a Docker Image

The user can build a Docker image for running Hive on MR3 on Kubernetes. (We assume that the user can execute the command docker so as to build a Docker image.) The first step is to collect all necessary files in the directory kubernetes/hive by executing build-k8s.sh:

--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.2) (default).

Note that before executing build-k8s.sh, HADOOP_HOME_LOCAL in env.sh should point to the installation directory of Hadoop. build-k8s.sh copies some files from the Hadoop installation as well as jar files from MR3, Tez, and Hive. Here is an example:

$ ./clean-k8.sh
$ ./build-k8s.sh --hivesrc3
$ ls kubernetes/hive/hadoop/apache-hadoop/
bin  etc  lib  libexec  share
$ ls kubernetes/hive/hive/apache-hive/
bin  conf  hcatalog  lib

Next the user should set two environment variables in kubernetes/env.sh (not env.sh in the installation directory):

$ vi kubernetes/env.sh

DOCKER_HIVE_IMG=10.1.91.17:5000/hive3:latest
DOCKER_USER=root
  • DOCKER_HIVE_IMG is the full name of the Docker image including a tag. It specifies the name of the Docker image for running HiveServer2 which may include the address of a running Docker server.
  • DOCKER_USER should match the user specified in kubernetes/hive/Dockerfile (which is root by default).

By default, ContainerWorker Pods use the same Docker image specified by the environment variable DOCKER_HIVE_IMG. Alternatively the user can choose to create a separate Docker image which uses kubernetes/hive/Dockerfile-worker and is smaller than the main Docker image. In order to create a Docker image for ContainerWorker Pods, either set an environment variable DOCKER_HIVE_WORKER_IMG to an approprivate name (e.g., 10.1.91.17:5000/hive3worker:latest) or directly update kubernetes/env.sh.

$ vi kubernetes/env.sh

DOCKER_HIVE_WORKER_IMG=${DOCKER_HIVE_WORKER_IMG:-$DOCKER_HIVE_IMG}

The last step is to build a Docker image from Dockerfile in the directory kubernetes/hive by executing kubernetes/build-hive.sh. The script builds a Docker image (which contains everything for running HiveServer2, DAGAppMaster, and ContainerWorker) and registers it to the Docker server specified in kubernetes/env.sh. If successful, the user can pull the Docker image on another node:

$ kubernetes/build-hive.sh
$ docker pull 10.1.91.17:5000/hive3
Using default tag: latest
Trying to pull repository 10.1.91.17:5000/hive3 ... 
latest: Pulling from 10.1.91.17:5000/hive3
581e78aaf612: Already exists 
...
14c45e1e7f1d: Pull complete 
Digest: sha256:ce1959eea28f53cb4d075466b5c5b613801b2b8d033f7469949505766f9af621
Status: Downloaded newer image for 10.1.91.17:5000/hive3:latest

Dockerfile included in an MR3 release is based on the Docker image openjdk:8-jre-slim. The user may use his or her own Dockerfile as far as the final Docker image contains: 1) Java 8; 2) glibc.

  • The popular alpine Linux does not include glibc.

After building a Docker image, proceed to Hive on Kubernetes or Basic Guide.