Installing with a pre-built Docker image
To use a pre-built Docker image from DockerHub, it suffices to use the MR3 release containing all the executable scripts from the GitHub repository (https://github.com/mr3project/mr3-run-k8s). Clone the repository.
$ git clone https://github.com/mr3project/mr3-run-k8s.git
$ cd mr3-run-k8s/
$ git checkout release-1.11-hive3
$ git clone https://github.com/mr3project/mr3-run-k8s.git
$ cd mr3-run-k8s/
That’s all you need!
We recommend the quick start guide On Kubernetes which demonstrates how to run Hive on MR3 on Kubernetes using a pre-built Docker Image. Or proceed to Basic Guide.
Building a new Docker image (Optional)
1. Installing a pre-built MR3 release
Download a pre-built MR3 release and uncompress it in a directory of your choice (e.g., under the user’s home directory). A pre-built MR3 release contains everything for running Hive on MR3 on Kubernetes, including scripts, preset configuration files, and jar files. (Hive 3 is built with access to Amazon S3.)
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.11/hivemr3-1.11-java17-hive3.1.3-k8s.tar.gz
$ gunzip -c hivemr3-1.11-java17-hive3.1.3-k8s | tar xvf -;
$ mv hivemr3-1.11-java17-hive3.1.3-k8s mr3-run
$ cd mr3-run/
Then update the following environment variables in env.sh
:
$ vi env.sh
HADOOP_HOME_LOCAL=$HADOOP_HOME
HIVE_MYSQL_DRIVER=
-
HADOOP_HOME_LOCAL
should point to an installation directory of Hadoop. Note that the user needs only a Hadoop installation and does not have to run a working Hadoop cluster. For example, it is okay to use the binary distribution of Hadoop downloaded from Apache Hadoop webpage without further configuration. The Hadoop installation should match the base version used in the MR3 release. For Hive 3, the use should install Hadoop 3.3.$ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz $ gunzip -c hadoop-3.3.1.tar.gz | tar xvf -
-
HIVE_MYSQL_DRIVER
should point to a database connector jar file which should be compatible with the database for Metastore (MySQL, Postgres, MS SQL, or Oracle). For MySQL, Postgres, and MS SQL,HIVE_MYSQL_DRIVER
may be set to empty because Metastore in Hive on MR3 either already includes or automatically downloads a compatible database connector. IfHIVE_MYSQL_DRIVER
is set to empty when using Oracle for Metastore, the Docker image (to be built later) does not include a database connector and the user should mount a database connector manually using a hostPath volume or PersistentVolume.
The following structure shows all files and directories relevant to Hive on Kubernetes:
# scripts for populating and cleaning directories for Docker images
├── build-k8s.sh
├── build-k8s-ats.sh
├── build-k8s-ranger.sh
├── clean-k8s.sh
├── clean-k8s-ats.sh
└── clean-k8s-ranger.sh
# scripts for building Docker images
└── kubernetes
├── build-hive.sh
├── build-ats.sh
└── build-ranger.sh
# scripts for running HiveServer2, Timeline Server, Ranger
└── kubernetes
├── config-run.sh
├── generate-hivemr3-ssl.sh
├── run-hive.sh
├── run-metastore.sh
├── run-ats.sh
└── run-ranger.sh
# resources for running Metastore and HiveServer2
└── kubernetes
├── env.sh
├── conf
│ ├── core-site.xml
│ ├── hive-log4j2.properties
│ ├── hive-log4j.properties
│ ├── hive-site.xml
│ ├── jgss.conf
│ ├── krb5.conf
│ ├── mapred-site.xml
│ ├── mr3-site.xml
│ ├── ranger-hive-audit.xml
│ ├── ranger-hive-security.xml
│ ├── ranger-policymgr-ssl.xml
│ ├── tez-site.xml
│ └── yarn-site.xml
├── key
└── hive
├── common-setup.sh
├── Dockerfile
├── Dockerfile-worker
├── hadoop
│ └── hadoop-setup.sh
├── hive
│ ├── hiveserver2-service.sh
│ ├── hive-setup.sh
│ ├── master-control.sh
│ ├── metastore-service.sh
│ ├── run-beeline.sh
│ ├── run-hive-cli.sh
│ ├── run-hplsql.sh
│ ├── run-master.sh
│ └── run-worker.sh
├── mr3
│ └── mr3-setup.sh
└── tez
└── tez-setup.sh
# resources for running Timeline Server
└── kubernetes
├── ats-conf
│ ├── core-site.xml
│ ├── krb5.conf
│ ├── log4j.properties
│ ├── ssl-server.xml
│ └── yarn-site.xml
├── ats-key
└── ats
├── Dockerfile
└── timeline-service.sh
# resources for running Ranger
└── kubernetes
├── ranger-conf
│ ├── core-site.xml
│ ├── krb5.conf
│ ├── ranger-log4j.properties
│ ├── solr-core.properties
│ ├── solr-elevate.xml
│ ├── solr-log4j2.xml
│ ├── solr-managed-schema
│ ├── solr-security.json
│ ├── solr-solrconfig.xml
│ └── solr-solr.xml
├── ranger-key
│ ├── install.properties
│ └── solr.in.sh
└── ranger
├── Dockerfile
├── start-ranger.sh
└── start-solr.sh
# YAML files
└── kubernetes
└── yaml
├── ats-service.yaml
├── ats.yaml
├── cluster-role.yaml
├── hive-role.yaml
├── hiveserver2-service.yaml
├── hive-service-account.yaml
├── hive.yaml
├── master-role.yaml
├── master-service-account.yaml
├── metastore-role.yaml
├── metastore-service.yaml
├── metastore.yaml
├── namespace.yaml
├── prometheus-service.yaml
├── ranger-service.yaml
├── ranger.yaml
├── workdir-pvc-ats.yaml
├── workdir-pv-ats.yaml
├── workdir-pvc-ranger.yaml
├── workdir-pv-ranger.yaml
├── workdir-pvc.yaml
├── workdir-pv.yaml
├── worker-role.yaml
└── worker-service-account.yaml
2. Building a Docker Image
The user can build a Docker image for running Hive on MR3 on Kubernetes.
(We assume that the user can execute the command docker
so as to build a Docker image.)
The first step is to collect all necessary files in the directory kubernetes/hive
by executing build-k8s.sh
:
--hivesrc3 # Choose hive3-mr3 (based on Hive 3.1.3) (default).
Note that before executing build-k8s.sh
,
HADOOP_HOME_LOCAL
in env.sh
should point to the installation directory of Hadoop.
build-k8s.sh
copies some files from the Hadoop installation as well as jar files from MR3, Tez, and Hive.
Here is an example:
$ ./clean-k8.sh
$ ./build-k8s.sh --hivesrc3
$ ls kubernetes/hive/hadoop/apache-hadoop/
bin etc lib libexec share
$ ls kubernetes/hive/hive/apache-hive/
bin conf hcatalog lib
Next the user should set two environment variables in kubernetes/env.sh
(not env.sh
in the installation directory):
$ vi kubernetes/env.sh
DOCKER_HIVE_IMG=10.1.91.17:5000/hive3:latest
DOCKER_USER=hive
DOCKER_HIVE_IMG
is the full name of the Docker image including a tag. It specifies the name of the Docker image for running HiveServer2 which may include the address of a running Docker server.DOCKER_USER
should match the user specified inkubernetes/hive/Dockerfile
(which ishive
by default).
By default, ContainerWorker Pods use the same Docker image specified by the environment variable DOCKER_HIVE_IMG
.
Alternatively the user can choose to create a separate Docker image which uses kubernetes/hive/Dockerfile-worker
and is smaller than the main Docker image.
In order to create a Docker image for ContainerWorker Pods, either set an environment variable DOCKER_HIVE_WORKER_IMG
to an approprivate name (e.g., 10.1.91.17:5000/hive3worker:latest
)
or directly update kubernetes/env.sh
.
$ vi kubernetes/env.sh
DOCKER_HIVE_WORKER_IMG=${DOCKER_HIVE_WORKER_IMG:-$DOCKER_HIVE_IMG}
The last step is to build a Docker image from Dockerfile
in the directory kubernetes/hive
by executing kubernetes/build-hive.sh
.
The script builds a Docker image (which contains everything for running HiveServer2, DAGAppMaster, and ContainerWorker) and registers it to the Docker server specified in kubernetes/env.sh
.
If successful, the user can pull the Docker image on another node:
$ kubernetes/build-hive.sh
$ docker pull 10.1.91.17:5000/hive3
Using default tag: latest
Trying to pull repository 10.1.91.17:5000/hive3 ...
latest: Pulling from 10.1.91.17:5000/hive3
581e78aaf612: Already exists
...
14c45e1e7f1d: Pull complete
Digest: sha256:ce1959eea28f53cb4d075466b5c5b613801b2b8d033f7469949505766f9af621
Status: Downloaded newer image for 10.1.91.17:5000/hive3:latest
Dockerfile
included in an MR3 release is based on the Docker image zulu-openjdk-centos
.
The user may use his or her own Dockerfile
as far as the final Docker image contains: 1) Java 17; 2) glibc.
- The popular alpine Linux does not include glibc.
After building a Docker image, proceed to Hive on Kubernetes or Basic Guide.