The user can start Metastore by executing the script
kubernetes/run-metastore.sh as explained in Running Metastore.
The user can also start HiveServer2 by executing the script
kubernetes/run-hive.sh as explained in Running HiveServer2.
After running several queries, the user can find DAGAppMaster and ContainerWorker Pods as well.
$ kubectl get pods -n hivemr3 NAME READY STATUS RESTARTS AGE efs-provisioner-6947dc8bcc-9f4hq 1/1 Running 0 3h59m hivemr3-hiveserver2-lmpv7 1/1 Running 0 159m hivemr3-metastore-0 1/1 Running 0 3h25m mr3master-9802-0 1/1 Running 0 158m mr3worker-392c-2 1/1 Running 0 152m
Below we review additional steps for running Metastore and HiveServer2.
Placing the Metastore Pod
The user should update the file
kubernetes/yaml/metastore.yaml so as to place the Metastore Pod in the
mr3-master node group.
In order to place the Metastore Pod in the
mr3-master node group,
nodeAffinity rule to
$ vi kubernetes/yaml/metastore.yaml spec: template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: roles operator: In values: - "masters"
Here we set the
key field to
roles and the
values field to
mr3-master node group is created with a label
For HiveServer2, we do not need such a
nodeAffinity rule because the existing
podAffinity rule places the HiveServer2 Pod on the same node where the Metastore Pod runs.
Configuring the LoadBalancer
Executing the script
kubernetes/run-hive.sh creates a new LoadBalancer for HiveServer2.
Get the security group associated with the LoadBalancer.
If necessary, edit the inbound rule in order to restrict the source IP addresses (by changing the source from
The LoadBalancer disconnects Beeline showing no activity for the idle timeout period, which is 60 seconds by default. The user may want to increase the idle timeout period, e.g., to 600 seconds. Otherwise Beeline loses the connection to HiveServer2 even after a brief period of inactivity.
The LoadBalancer sends ping signals to HiveServer2 to check its health.
Since it interprets a ping signal as an invalid request, HiveServer2 may print ERROR messages due to
2019-08-25T17:52:06,351 ERROR [HiveServer2-Handler-Pool: Thread-29] server.TThreadPoolServer: Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream ...
This message can be ignored, but if the user wants to see fewer such messages, increase the ping interval of the LoadBalancer.
Finding the address of HiveServer2
To run Beeline, get the LoadBalancer Ingress of the Service
$ kubectl describe service -n hivemr3 hiveserver2 Name: hiveserver2 ... IP: 10.100.143.232 External IPs: 220.127.116.11 LoadBalancer Ingress: a4e5adbd2c7e011e9a34c0a411ce04be-1099963422.ap-northeast-2.elb.amazonaws.com ...
Get the IP address of the LoadBalancer Ingress.
$ nslookup a4e5adbd2c7e011e9a34c0a411ce04be-1099963422.ap-northeast-2.elb.amazonaws.com ... Non-authoritative answer: Name: a4e5adbd2c7e011e9a34c0a411ce04be-1099963422.ap-northeast-2.elb.amazonaws.com Address: 18.104.22.168 Name: a4e5adbd2c7e011e9a34c0a411ce04be-1099963422.ap-northeast-2.elb.amazonaws.com Address: 22.214.171.124
In this example, the user can use
126.96.36.199 as the IP address of HiveServer2 when running Beeline.
This is because Beeline connects first to the LoadBalancer, not directly to HiveServer2.