Skip to main content

On Kubernetes

With MR3 as the execution engine, the user can run Hive directly on Kubernetes without installing Hadoop. All the enterprise features of Apache Hive are equally available, such as high availability, Kerberos authentication, SSL data encryption, and authorization with Apache Ranger. On public clouds, Hive on MR3 can take advantage of autoscaling supported by MR3.

A Kubernetes cluster created with Hive on MR3 consists of the following components:

  • Key components of Hive on MR3 -- Metastore, HiveServer2, MR3 DAGAppMaster, and MR3 ContainerWorkers
  • Apache Ranger for managing authorization
  • MR3-UI and Grafana for visualization

We assume some familiarity with key concepts in Kubernetes such as Pods, Containers, Volumes, ConfigMaps, and Secrets.

Prerequisites

Running Hive on MR3 on Kubernetes has the following prerequisites:

  1. A running Kubernetes cluster is available.
  2. A database server for the Metastore database is running.
  3. Either HDFS or S3 (or S3-compatible storage) is available for storing the warehouse. For using S3, access credentials are required.
  4. The user can either create a PersistentVolume or store transient data on HDFS or S3. The PersistentVolume should be writable to 1) the user with UID 1000, and 2) user nobody (corresponding to root user) if Ranger is to be used for authorization.
  5. Every worker node has an identical set of local directories for storing intermediate data (to be mapped to hostPath volumes). These directories should be writable to the user with UID 1000 because all containers run as non-root user with UID 1000.
  6. The user can run Beeline (or any client program) to connect to HiveServer2 running at a given address.

In our example, we use a MySQL server for the Metastore database, but PostgreSQL and MS SQL are also okay to use.

Proceed to one of the following guides:

  • Shell Scripts shows how to run Hive on MR3 on Kubernetes using executable shell scripts.
  • Helm shows how to run Hive on MR3 on Kubernetes using Helm.
  • TypeScript shows how to use TypeScript code to run Hive on MR3 on Kubernetes, along with Ranger, MR3-UI, Grafana, Superset.
  • Common Procedures contains additional procedures related to compaction and security in Hive on MR3.
info

To try without installing additional dependencies, use executable shell scripts.

Helm reduces the complexity of setting configuration parameters to some extent.

TypeScript code reads a configuration file written in TypeScript to generate a single YAML file containing the entire specification for running Hive on MR3. It significantly reduces the complexity of setting configuration parameters because all configuration parameters in the output YAML file are set consistently across all the components. There is no need to be familiar with TypeScript.

tip

We recommend that the user try Hive on MR3 on Minikube before running it on Kubernetes.

tip

Before trying to run Hive on MR3 on Kubernetes, check if swap space is disabled in the Kubernetes cluster.