With MR3 as the execution engine, the user can run Hive on Kubernetes. The three versions of Hive supported by MR3 (from Hive 2 to Hive 4) all run on Kubernetes. Hive on MR3 directly creates and destroys ContainerWorker Pods while running as fast as on Hadoop. All the enterprise features from Hive on Hadoop are equally available such as high availability, Kerberos-based security, SSL data encryption, authorization with Apache Ranger, and so on. On public clouds, Hive on MR3 can take advantage of autoscaling supported by MR3.
- Installing on Kubernetes
- Quick Start Guide - On Kubernetes
- Quick Start Guide - On Amazon EKS with Autoscaling
- Basic Guide
- Advanced Guide
- Running Queries
- Performance Guide
- Using Helm
- On Amazon EKS
- On AWS Fargate
The following video demonstrates fault tolerance in Hive on MR3 on Kubernetes.
- We use Hive 3.1.2 running on MR3 1.1.
- We use the TPC-DS benchmark with a scale factor of 10TB on a cluster of 42 nodes.
- We kill ContainerWorker Pods while a query is running. The query completes successfully after Vertex reruns.
- We kill the DAGAppMaster Pod while a query is running. A new DAGAppMaster Pod is created and the query resumes quickly.