There are three ways to install Hive on MR3 on Kubernetes.
- Use a pre-built Docker image from DockerHub and an MR3 release containing the executable scripts from GitHub.
- Download an MR3 release and build all necessary components from the source code, and build a Docker image.
- Download a pre-built MR3 release and build a Docker image.
For the first option, we refer the user to the quick start guide (e.g., On Minikube with a Pre-built Docker Image and On EKS with Autoscaling). The second option is for the user who wants to customize Hive for MR3 (e.g., by applying patches from Apache Hive). Below we explain the third option.
Installing Hive on MR3
Download a pre-built MR3 release and uncompress it in a directory of your choice (e.g., under the user’s home directory). A pre-built MR3 release contains everything for running Hive on MR3 on Kubernetes, including scripts, preset configuration files, and jar files. (Hive 3 and Hive 4 with MR3 master are built with access to Amazon S3.)
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.2/hivemr3-1.2-hive2.3.6.tar.gz
$ gunzip -c hivemr3-1.2-hive2.3.6.tar.gz | tar xvf -;
$ mv hivemr3-1.2-hive2.3.6 mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.2/hivemr3-1.2-hive3.1.2-k8s.tar.gz
$ gunzip -c hivemr3-1.2-hive3.1.2-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.2-hive3.1.2-k8s mr3-run
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.2/hivemr3-1.2-hive4.0.0-k8s.tar.gz
$ gunzip -c hivemr3-1.2-hive4.0.0-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.2-hive4.0.0-k8s mr3-run
Then update the following environment variables in env.sh
:
$ vi env.sh
HADOOP_HOME_LOCAL=$HADOOP_HOME
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
-
HADOOP_HOME_LOCAL
should point to an installation directory of Hadoop. Note that the user needs only a Hadoop installation and does not have to run a working Hadoop cluster. For example, it is okay to use the binary distribution of Hadoop downloaded from Apache Hadoop webpage without further configuration. The Hadoop installation should match the base version used in the MR3 release. For example, with MR3 releasemaster
, the use should install Hadoop 3.1.$ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz $ gunzip -c hadoop-2.7.7.tar.gz | tar xvf -
$ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz $ gunzip -c hadoop-3.1.2.tar.gz | tar xvf -
-
HIVE_MYSQL_DRIVER
should point to a MySQL connector jar file which is necessary when connecting to a MySQL database. The MySQL connector jar file should be compatible with the MySQL databases for Metastore and Ranger. IfHIVE_MYSQL_DRIVER
is set to empty, the Docker image (to be built later) does not include a MySQL connector and the user should mount a MySQL connector manually using a hostPath volume or PersistentVolume.
The following structure shows all files and directories relevant to Hive on Kubernetes:
# scripts for populating and cleaning directories for Docker images
├── build-k8s.sh
├── build-k8s-ats.sh
├── build-k8s-ranger.sh
├── clean-k8s.sh
├── clean-k8s-ats.sh
└── clean-k8s-ranger.sh
# scripts for building Docker images
└── kubernetes
├── build-hive.sh
├── build-ats.sh
└── build-ranger.sh
# scripts for running HiveServer2, Timeline Server, Ranger
└── kubernetes
├── config-run.sh
├── generate-hivemr3-ssl.sh
├── run-hive.sh
├── run-metastore.sh
├── run-ats.sh
└── run-ranger.sh
# resources for running Metastore and HiveServer2
└── kubernetes
├── env.sh
├── conf
│ ├── core-site.xml
│ ├── hive-log4j2.properties
│ ├── hive-log4j.properties
│ ├── hive-site.xml
│ ├── jgss.conf
│ ├── krb5.conf
│ ├── mapred-site.xml
│ ├── mr3-site.xml
│ ├── ranger-hive-audit.xml
│ ├── ranger-hive-security.xml
│ ├── ranger-policymgr-ssl.xml
│ ├── tez-site.xml
│ └── yarn-site.xml
├── key
└── hive
├── common-setup.sh
├── Dockerfile
├── Dockerfile-worker
├── hadoop
│ └── hadoop-setup.sh
├── hive
│ ├── hiveserver2-service.sh
│ ├── hive-setup.sh
│ ├── master-control.sh
│ ├── metastore-service.sh
│ ├── run-beeline.sh
│ ├── run-hive-cli.sh
│ ├── run-hplsql.sh
│ ├── run-master.sh
│ └── run-worker.sh
├── mr3
│ └── mr3-setup.sh
└── tez
└── tez-setup.sh
# resources for running Timeline Server
└── kubernetes
├── ats-conf
│ ├── core-site.xml
│ ├── krb5.conf
│ ├── log4j.properties
│ ├── ssl-server.xml
│ └── yarn-site.xml
├── ats-key
└── ats
├── Dockerfile
└── timeline-service.sh
# resources for running Ranger
└── kubernetes
├── ranger-conf
│ ├── core-site.xml
│ ├── krb5.conf
│ ├── ranger-admin-site.xml
│ ├── ranger-log4j.properties
│ ├── solr-core.properties
│ ├── solr-elevate.xml
│ ├── solr.in.sh
│ ├── solr-log4j2.xml
│ ├── solr-managed-schema
│ ├── solr-security.json
│ ├── solr-solrconfig.xml
│ └── solr-solr.xml
├── ranger-key
│ └── install.properties
└── ranger
├── Dockerfile
├── start-ranger.sh
└── start-solr.sh
# YAML files
└── kubernetes
└── yaml
├── ats-service.yaml
├── ats.yaml
├── cluster-role.yaml
├── hive-role.yaml
├── hiveserver2-service.yaml
├── hive-service-account.yaml
├── hive.yaml
├── master-role.yaml
├── master-service-account.yaml
├── metastore-role.yaml
├── metastore-service.yaml
├── metastore.yaml
├── namespace.yaml
├── prometheus-service.yaml
├── ranger-service.yaml
├── ranger.yaml
├── workdir-pvc-ats.yaml
├── workdir-pv-ats.yaml
├── workdir-pvc-ranger.yaml
├── workdir-pv-ranger.yaml
├── workdir-pvc.yaml
├── workdir-pv.yaml
├── worker-role.yaml
└── worker-service-account.yaml
Building a Docker Image
The user can build a Docker image for running Hive on MR3 on Kubernetes.
(We assume that the user can execute the command docker
so as to build a Docker image.)
The first step is to collect all necessary files in the directory kubernetes/hive
by executing build-k8s.sh
:
--hivesrc2 # Choose hive2-mr3 (based on Hive 2.3.6).
--hivesrc3 # Choose hive3-mr3 (based on Hive 3.1.2) (default).
--hivesrc4 # Choose hive4-mr3 (based on Hive 4.0.0-SNAPSHOT).
Note that before executing build-k8s.sh
,
HADOOP_HOME_LOCAL
in env.sh
should point to the installation directory of Hadoop.
build-k8s.sh
copies some files from the Hadoop installation as well as jar files from MR3, Tez, and Hive.
Here is an example:
$ ./clean-k8.sh
$ ./build-k8s.sh --hivesrc3
$ ls kubernetes/hive/hadoop/apache-hadoop/
bin etc lib libexec share
$ ls kubernetes/hive/hive/apache-hive/
bin conf hcatalog lib
Next the user should set two environment variables in kubernetes/env.sh
(not env.sh
in the installation directory):
$ vi kubernetes/env.sh
DOCKER_HIVE_IMG=10.1.91.17:5000/hive3:latest
DOCKER_USER=root
DOCKER_HIVE_IMG
is the full name of the Docker image including a tag. It specifies the name of the Docker image for running HiveServer2 which may include the address of a running Docker server.DOCKER_USER
should match the user specified inkubernetes/hive/Dockerfile
(which isroot
by default).
By default, ContainerWorker Pods use the same Docker image specified by the environment variable DOCKER_HIVE_IMG
.
Alternatively the user can choose to create a separate Docker image which uses kubernetes/hive/Dockerfile-worker
and is smaller than the main Docker image.
In order to create a Docker image for ContainerWorker Pods, either set an environment variable DOCKER_HIVE_WORKER_IMG
to an approprivate name (e.g., 10.1.91.17:5000/hive3worker:latest
)
or directly update kubernetes/env.sh
.
$ vi kubernetes/env.sh
DOCKER_HIVE_WORKER_IMG=${DOCKER_HIVE_WORKER_IMG:-$DOCKER_HIVE_IMG}
The last step is to build a Docker image from Dockerfile
in the directory kubernetes/hive
by executing kubernetes/build-hive.sh
.
The script builds a Docker image (which contains everything for running HiveServer2, DAGAppMaster, and ContainerWorker) and registers it to the Docker server specified in kubernetes/env.sh
.
If successful, the user can pull the Docker image on another node:
$ kubernetes/build-hive.sh
$ docker pull 10.1.91.17:5000/hive3
Using default tag: latest
Trying to pull repository 10.1.91.17:5000/hive3 ...
latest: Pulling from 10.1.91.17:5000/hive3
581e78aaf612: Already exists
...
14c45e1e7f1d: Pull complete
Digest: sha256:ce1959eea28f53cb4d075466b5c5b613801b2b8d033f7469949505766f9af621
Status: Downloaded newer image for 10.1.91.17:5000/hive3:latest
Dockerfile
included in an MR3 release is based on the Docker image openjdk:8-jre-slim
.
The user may use his or her own Dockerfile
as far as the final Docker image contains: 1) Java 8; 2) glibc.
- The popular alpine Linux does not include glibc.
After building a Docker image, proceed to Hive on Kubernetes or Basic Guide.