Prerequisites
Using MR3 Cloud on Amazon EKS has the following prerequisites:
- The user can create IAM policies.
- The user has access to an S3 bucket storing the warehouse and all S3 buckets containing datasets.
- The user can create an EKS cluster with the command
eksctl
. - The user can configure LoadBalancers.
- The user can create EFS.
- A database server for Metastore is ready and accessible from the EKS cluster.
- A database server for Ranger is ready and accessible from the EKS cluster. It is okay to use the same database server for Metastore.
- The user can run Beeline to connect to HiveServer2 running at a given address.
The user may create new resources (such as IAM policies) either on the AWS console or by executing AWS CLI.
Load
After loading a configuration, the menu bar at the top shows several menus colored either in red or in grey.
A red menu means that some parameters are wrong or missing, and a grey menu means that it has not been visited. After downloading a YAML file, the menu turns green.
The menu Connect
turns green when all the other menus are green.
Download
When all the input fields are okay,
press the Download
button to download a YAML file.
Then the user can execute the command eksctl
or kubectl
.
The quick start guide Using MR3 Cloud on EKS contains more details.
EKS page
1. IAM policy for autoscaling
Create an IAM policy for autoscaling as shown below.
Get the ARN (Amazon Resource Name) of the IAM policy.
In our example, we create an IAM policy called EKSAutoScalingWorkerPolicy
.
$ vi EKSAutoScalingWorkerPolicy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeInstanceTypes",
"eks:DescribeNodegroup"
],
"Resource": ["*"]
}
]
}
$ aws iam create-policy --policy-name EKSAutoScalingWorkerPolicy --policy-document file://EKSAutoScalingWorkerPolicy.json
{
"Policy": {
...
"Arn": "arn:aws:iam::111111111111:policy/EKSAutoScalingWorkerPolicy",
...
Use the ARN in the field Autoscaling Policy in the section IAM Policy.
2. IAM policy for accessing S3 buckets
Create an IAM policy
for allowing every Pod to access S3 buckets storing the warehouse and containing datasets.
Adjust the Action
field to restrict the set of operations permitted to Pods.
Get the ARN of the IAM policy.
In our example, we create an IAM policy called MR3AccessS3
.
$ vi MR3AccessS3.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::hive-warehouse-dir",
"arn:aws:s3:::hive-warehouse-dir/*",
"arn:aws:s3:::hive-partitioned-1000-orc",
"arn:aws:s3:::hive-partitioned-1000-orc/*"
]
}
]
}
$ aws iam create-policy --policy-name MR3AccessS3 --policy-document file://MR3AccessS3.json
{
"Policy": {
...
"Arn": "arn:aws:iam::111111111111:policy/MR3AccessS3",
...
Use the ARN in the field S3 Access Policy in the section IAM Policy.
3. Creating an EKS cluster
Fill in all the input fields.
Choose a unique name for an EKS cluster and use it in the field Name.
Download a YAML file eks-cluster.yaml
and execute the command eksctl
.
$ eksctl create cluster -f eks-cluster.yaml
2022-05-24 17:23:31 [ℹ] eksctl version 0.86.0
2022-05-24 17:23:31 [ℹ] using region ap-northeast-2
2022-05-24 17:23:31 [ℹ] setting availability zones to [ap-northeast-2c ap-northeast-2a ap-northeast-2d]
..
2022-05-24 17:39:49 [✔] EKS cluster "hive-mr3" in "ap-northeast-2" region is ready
The user can verify that only a master node is available in the EKS cluster.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-55-210.ap-northeast-2.compute.internal Ready <none> 59s v1.21.5-eks-9017834
Get the public IP address of the master node which we may need when checking access to the database servers for Metastore and Ranger on the Applications page.
$ kubectl describe node ip-192-168-55-210.ap-northeast-2.compute.internal | grep -e InternalIP -e ExternalIP
InternalIP: 192.168.55.210
ExternalIP: 3.34.187.73
Autoscaler page
Download a YAML file autoscaler.yaml
and execute the command kubectl
to start the Kubernetes Autoscaler.
$ kubectl apply -f autoscaler.yaml
serviceaccount/cluster-autoscaler created
deployment.apps/cluster-autoscaler created
clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
role.rbac.authorization.k8s.io/cluster-autoscaler created
rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
The user can check that the Kubernetes Autoscaler has started properly.
$ kubectl get pods -n kube-system | grep autoscaler
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-cbd5c6cf-msbpx 1/1 Running 0 18s
Services page
Choose a namespace and use it in the field Namespace. In order to use HTTPS when connecting to the Apache server, the user should provide an SSL certificate created with AWS Certificate Manager. Use its ARN in the field SSL Certificate ARN.
Download a YAML file service.yaml
and execute the command kubectl
to create two LoadBalancer services.
Later
the LoadBalancer with LoadBalancerPort 8080 is connected to an Apacher server
while the LoadBalancer with LoadBalancerPort 10001 is connect to HiveServer2.
$ kubectl create -f service.yaml
namespace/hivemr3 created
service/apache created
service/hiveserver2 created
$ aws elb describe-load-balancers
...
"CanonicalHostedZoneName": "a75c6382cd52746b4bc0370f0495d3c8-1372639668.ap-northeast-2.elb.amazonaws.com",
"CanonicalHostedZoneNameID": "ZWKZPGTI48KDX",
"ListenerDescriptions": [
{
"Listener": {
"Protocol": "TCP",
"LoadBalancerPort": 8080,
...
"CanonicalHostedZoneName": "ac004dea1e4224b249b0ea88183d96c8-2015438666.ap-northeast-2.elb.amazonaws.com",
"CanonicalHostedZoneNameID": "ZWKZPGTI48KDX",
"ListenerDescriptions": [
{
"Listener": {
"Protocol": "TCP",
"LoadBalancerPort": 10001,
...
Get the address and hostname of each service (which we use on the Applications page).
$ nslookup a75c6382cd52746b4bc0370f0495d3c8-1372639668.ap-northeast-2.elb.amazonaws.com
...
Name: a75c6382cd52746b4bc0370f0495d3c8-1372639668.ap-northeast-2.elb.amazonaws.com
Address: 3.39.33.1
Name: a75c6382cd52746b4bc0370f0495d3c8-1372639668.ap-northeast-2.elb.amazonaws.com
Address: 13.209.124.90
$ nslookup ac004dea1e4224b249b0ea88183d96c8-2015438666.ap-northeast-2.elb.amazonaws.com
...
Name: ac004dea1e4224b249b0ea88183d96c8-2015438666.ap-northeast-2.elb.amazonaws.com
Address: 3.38.63.119
Name: ac004dea1e4224b249b0ea88183d96c8-2015438666.ap-northeast-2.elb.amazonaws.com
Address: 13.209.183.4
EFS page
1. Subnet ID and security group ID
Assuming that the name of the EKS cluster is hive-mr3
(specified in the field Name on the EKS page),
get the VPC ID of CloudFormation eksctl-hive-mr3-eks-cluster
.
$ aws ec2 describe-vpcs --filter Name=tag:aws:cloudformation:stack-name,Values=eksctl-hive-mr3-cluster --query "Vpcs[*].[VpcId]"
[
[
"vpc-06e54d3ea607cc43b"
]
]
$ VPCID=vpc-06e54d3ea607cc43b
Get the public subnet ID of CloudFormation eksctl-hive-mr3-eks-cluster
.
$ aws ec2 describe-subnets --filter Name=vpc-id,Values=$VPCID Name=availability-zone,Values=ap-northeast-2a Name=tag:aws:cloudformation:stack-name,Values=eksctl-hive-mr3-cluster Name=tag:Name,Values="*Public*" --query "Subnets[*].[SubnetId]"
[
[
"subnet-07c676d17a301e4af"
]
]
$ SUBNETID=subnet-07c676d17a301e4af
Get the ID of the security group for the EKS cluster
that matches the pattern eksctl-hive-mr3-eks-cluster-ClusterSharedNodeSecurityGroup-*
.
$ aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPCID Name=group-name,Values="eksctl-hive-mr3-cluster-ClusterSharedNodeSecurityGroup-*" --query "SecurityGroups[*].[GroupName,GroupId]"
[
[
"eksctl-hive-mr3-cluster-ClusterSharedNodeSecurityGroup-156X37EGV080",
"sg-0280692de7b048468"
]
]
$ SGROUPALL=sg-0280692de7b048468
2. Creating and mounting EFS
Create EFS in the Availability Zone specified in the section General on the EKS page. Get the file system ID of EFS.
$ aws efs create-file-system --performance-mode generalPurpose --throughput-mode bursting --availability-zone-name ap-northeast-2a
...
"FileSystemId": "fs-0226705cce380a0cd",
...
$ EFSID=fs-0226705cce380a0cd
Create a mount target using the subnet ID of CloudFormation eksctl-hive-mr3-cluster
and the security group ID for the EKS cluster.
Get the mount target ID which is necessary when deleting the EKS cluster.
$ aws efs create-mount-target --file-system-id $EFSID --subnet-id $SUBNETID --security-groups $SGROUPALL
...
"MountTargetId": "fsmt-0cd125aee66e5d71a",
...
$ MOUNTID=fsmt-0cd125aee66e5d71a
3. Creating a StorageClass
Use the EFS ID (not the mount target ID) in the field EFS ID.
Download a YAML file efs.yaml
and execute the command kubectl
.
$ kubectl create -f efs.yaml
serviceaccount/efs-provisioner created
configmap/efs-provisioner created
deployment.apps/efs-provisioner created
storageclass.storage.k8s.io/aws-efs created
clusterrole.rbac.authorization.k8s.io/efs-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-efs-provisioner created
role.rbac.authorization.k8s.io/leader-locking-efs-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-efs-provisioner created
The user can find a new StorageClass aws-efs
and a new Pod in the namespace specified on the Services page.
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
aws-efs example.com/aws-efs Delete Immediate false 28s
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 22m
$ kubectl get pods -n hivemr3
NAME READY STATUS RESTARTS AGE
efs-provisioner-749fcdf477-v5jb5 1/1 Running 0 47s
Applications page
1. Access to the database servers for Metastore and Ranger
Check if the database servers for Metastore and Ranger are accessible from the master node. If the database server is running on Amazon AWS, the user may have to update its security group or VPC configuration.
2. Start all the components
Fill in all necessary nput fields.
Download a YAML file apps.yaml
and execute the command eksctl
.
$ kubectl create -f apps.yaml
The user can find that a PersistentVolumeClaim workdir-pvc
is in use.
$ kubectl get pvc -n hivemr3
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
workdir-pvc Bound pvc-668a010b-05e8-408a-b726-98c3d1dc0fc3 100Gi RWX aws-efs 95s
Before executing queries, a total of 9 Pods are created in the namespace specified on the Services page.
$ kubectl get pods -n hivemr3
NAME READY STATUS RESTARTS AGE
efs-provisioner-749fcdf477-v5jb5 1/1 Running 0 97m
hivemr3-apache-0 1/1 Running 0 2m3s
hivemr3-hiveserver2-789bb49978-tb497 1/1 Running 0 2m4s
hivemr3-hiveserver2-internal-548d4454c4-wtx57 1/1 Running 0 2m4s
hivemr3-metastore-0 1/1 Running 0 2m4s
hivemr3-ranger-0 2/2 Running 0 2m4s
hivemr3-superset-0 1/1 Running 0 2m4s
hivemr3-timeline-0 4/4 Running 0 2m3s
mr3master-5848-0-8669b5564d-jbnzm 1/1 Running 0 102s
Connect page
On the Connect page, the user can view the current configuration and save the current configuration in a JSON file with a given name.
In the left column, the user can find HiveServer2 connection URLs and links to Ranger, MR3-UI, Grafana, and Superset. Note that all the links point to the address of the Apache server or its subpaths.
Deleting the EKS cluster
Because of the additional components configured manually, it take a few extra steps to delete the EKS cluster. In order to delete the EKS cluster, proceed in the following order.
- Delete all the components.
$ kubectl delete -f apps.yaml
- Delete resources created automatically by Hive on MR3.
$ kubectl -n hivemr3 delete configmap mr3conf-configmap-master mr3conf-configmap-worker $ kubectl -n hivemr3 delete svc service-master-5848-0 service-worker $ kubectl -n hivemr3 delete deployment --all $ kubectl -n hivemr3 delete pods --all
- Delete the resources for EFS.
$ kubectl delete -f efs.yaml
- Delete the services.
$ kubectl delete -f service.yaml
- Remove the mount targe for EFS.
$ aws efs delete-mount-target --mount-target-id $MOUNTID
- Delete EFS if necessary. Note that the same EFS can be reused for the next installation of Hive on MR3.
$ aws efs delete-file-system --file-system-id $EFSID
- Stop Kubernetes Autoscaler
$ kubectl delete -f eks-autoscaler-autodiscover.yaml
- Delete EKS with
eksctl
.$ eksctl delete cluster -f eks-cluster.yaml
If the last command fails, the user should delete the EKS cluster manually. Proceed in the following order on the AWS console.
- Delete security groups manually.
- Delete the NAT gateway created for the EKS cluster, delete the VPC, and then delete the Elastic IP address.
- Delete LoadBalancers.
- Delete CloudFormations.