Running Hive on MR3 on AWS Fargate
As a serverless compute engine for containers, AWS Fargate enables the user to create Kubernetes Pods without provisioning EC2 instances. In conjunction with Amazon EKS, it allows us to run Hive on MR3 just like on Kubernetes with practically no overhead in managing EC2 instances. The combination of Amazon EKS and AWS Fargate is particularly appealing when we wish to achieve both fast autoscaling and cost saving.
In this section, we describe how to exploit Fargate for running Hive on MR3. We consider two scenarios:
Set up an EKS/Fargate cluster in which an on-demand EC2 instance hosts those Pods that should always be running, such as MetaStore, HiveServer2, and DAGAppMaster Pods, while all ContainerWorker Pods are delegated to Fargate.
Set up a Fargate-only cluster in which all Pods run on Fargate.
In both scenarios, we obtain the best of both worlds: EKS enables us to continue to use Hive on MR3 on Kubernetes, whereas Fargate enables us to quickly create and destroy ContainerWorker Pods without having to manage EC2 instances.
Limitations when using Fargate
Before trying Hive on MR3, the user should be aware of several limitations of Fargate.
- AWS Fargate with Amazon EKS is not available in every AWS region. For the list of available regions, see https://docs.aws.amazon.com/eks/latest/userguide/fargate.html.
FARGATE_SPOTfor the Fargate Provider is not supported by
eksctl(as of September 2020). Hence we cannot create ContainerWorker Pods in EC2 spot instances. If ContainerWorker Pods are not created and destroyed frequently, using Amazon EKS with spot instances for ContainerWorker Pods may be cheaper. For more information, see https://github.com/aws/containers-roadmap/issues/622.
- The maximum resources that can be assigned to a single ContainerWorker Pod is 4 CPU cores and 30GB of memory.
- The user cannot attach instance storage to ContainerWorker Pods.
As a consequence, a ContainerWorker Pod can use less than 20GB of disk space for writing intermediate data,
and it is not recommended to use LLAP I/O with
hive.llap.io.allocator.mmapset to true.
Cloning the executable scripts
We assume that the user has already created a Docker image by following the instruction in Installing on Kubernetes. For setting up an EKS/Fargate cluster, the user can use a pre-built Docker image available at DockerHub, in which case it suffices to download an MR3 release containing the executable scripts.
$ git clone https://github.com/mr3project/mr3-run-k8s.git
Before proceeding, restore those files created for Amazon EKS.
$ cd mr3-run-k8s/kubernetes/ $ cp -f env.sh.eks env.sh $ mv -f conf/core-site.xml.eks conf/core-site.xml $ mv -f conf/mr3-site.xml.eks conf/mr3-site.xml $ mv -f conf/hive-site.xml.eks conf/hive-site.xml $ mv -f yaml/metastore.yaml.eks yaml/metastore.yaml $ mv -f yaml/hive.yaml.eks yaml/hive.yaml $ mv -f yaml/ranger.yaml.eks yaml/ranger.yaml $ mv -f yaml/ats.yaml.eks yaml/ats.yaml