For Hive on MR3, running on Amazon EKS is similar to running on Kubernetes, but has the following restrictions:

  • The user should create a PersistentVolumeClaim using Amazon EFS (Elastic File System), not EBS (Elastic Block Store), because containers need to share files across multiple Availability Zones. Alternatively the user may use S3 without creating a PersistentVolumeClaim.
  • We assume that the data source is on S3 (Simple Storage Service).
  • For local disks for writing intermediate data in ContainerWorkers, the user should be able to either create EC2 instances with instance stores (i.e., local disks physically attached to host machines) or mount EBS volumes to EC2 instances. The user may choose to use the root partition for a local disk, but it runs the risk of running out of disk space in the middle of running a query.

We assume that the administrator user provides the following external components:

  • MySQL database for Metastore
  • MySQL database for Ranger (which can be the same MySQL database for Metastore)
  • KDC (Key Distribution Center) for managing Kerberos tickets

As we do not use HDFS, we do not require KMS (Key Management Server) for managing impersonation and delegation tokens. The cluster on Amazon EKS is depicted in the following diagram:

hive.eks.overview

This section gives details on how to create a cluster on Amazon EKS. We assume that the user has already created a Docker image by following the instruction in Installing on Kubernetes. Or the user can use a pre-built Docker image available at DockerHub, in which case it suffices to download an MR3 release containing the executable scripts.

$ git clone https://github.com/mr3project/mr3-run-k8s.git

Before proceeding, restore those files created for Amazon EKS.

$ cd mr3-run-k8s/kubernetes/
$ cp -f env.sh.eks env.sh
$ mv -f conf/core-site.xml.eks conf/core-site.xml
$ mv -f conf/mr3-site.xml.eks conf/mr3-site.xml
$ mv -f conf/hive-site.xml.eks conf/hive-site.xml
$ mv -f yaml/metastore.yaml.eks yaml/metastore.yaml
$ mv -f yaml/hive.yaml.eks yaml/hive.yaml
$ mv -f yaml/ranger.yaml.eks yaml/ranger.yaml
$ mv -f yaml/ats.yaml.eks yaml/ats.yaml