With MR3 as the execution engine, the user can run Hive directly on Kubernetes without installing Hadoop.
Hive on MR3 directly creates and destroys ContainerWorker Pods while running as fast as on Hadoop. All the enterprise features from Hive on Hadoop are equally available such as high availability, Kerberos-based security, SSL data encryption, authorization with Apache Ranger, and so on. On public clouds, Hive on MR3 can take advantage of autoscaling supported by MR3.
- Quick Start Guide - On Kubernetes
- Installing on Kubernetes
- Basic Guide
- Running Queries
- Advanced Guide
- Performance Guide
- On Amazon EKS
For trying Hive on MR3 on Kubernetes, we recommend the quick start guide On Kubernetes.
For the result of evaluating the performance of Hive on MR3 using the TPC-DS benchmark, see our blog article.
For asking any questions, please visit MR3 Google Group or join MR3 Slack.
The following video demonstrates fault tolerance in Hive on MR3 on Kubernetes.
- We use Hive 3.1.2 running on MR3 1.1.
- We use the TPC-DS benchmark with a scale factor of 10TB on a cluster of 42 nodes.
- We kill ContainerWorker Pods while a query is running. The query completes successfully after Vertex reruns.
- We kill the DAGAppMaster Pod while a query is running. A new DAGAppMaster Pod is created and the query resumes quickly.