For running HiveServer2, Hive on MR3 can create three kinds of Pods: HiveServer2 Pod, DAGAppMaster Pod, and ContainerWorker Pod.
- A HiveServer2 Pod runs a HiveServer2 container and optionally another container for DAGAppMaster in LocalProcess mode.
The user creates a HiveServer2 Pod by executing the script
kubernetes/run-hive.sh
. - A DAGAppMaster Pod is created by HiveServer2 when DAGAppMaster is configured to run in Kubernetes mode (i.e.,
mr3.master.mode
is set tokubernetes
inmr3-site.xml
). - A ContainerWorker Pod runs a ContainerWorker container and is created by DAGAppMaster at runtime.
Configuring the HiveServer2 Pod
The file kubernetes/yaml/hive.yaml
creates a Pod for running HiveServer2 (by creating a Deployment).
The user should update several sections in this file according to Kubernetes cluster settings.
In the spec/template/spec/containers
section:
- The
image
field should match the Docker image specified byDOCKER_HIVE_IMG
inkubernetes/env.sh
. - The
args
field specifies the DAGAppMaster mode:--localthread
for LocalThread mode,--localprocess
for LocalProcess mode, and--kubernetes
for Kubernetes mode. - The
resources/requests
andresources/limits
fields specify the resources to to be allocated to a HiveServer2 Pod. - The three fields
ports/containerPort
,readinessProbe/tcpSocket/port
, andlivenessProbe/tcpSocket/port
should match the port number specified inhiveserver2-service.yaml
.
$ vi kubernetes/yaml/hive.yaml
spec:
template:
spec:
containers:
- image: 10.1.91.17:5000/hive3
args: ["start", "--kubernetes"]
resources:
requests:
cpu: 4
memory: 32Gi
limits:
cpu: 4
memory: 32Gi
ports:
- containerPort: 9852
readinessProbe:
tcpSocket:
port: 9852
livenessProbe:
tcpSocket:
port: 9852
In the spec/template/spec/volumes
section:
- The
configMap/name
field underconf-k8s-volume
should match the name specified byCONF_DIR_CONFIGMAP
inkubernetes/env.sh
. - The
secret/secretName
field underkey-k8s-volume
should match the name specified byKEYTAB_SECRET
inkubernetes/env.sh
.
$ vi kubernetes/yaml/hive.yaml
spec:
template:
spec:
volumes:
- name: conf-k8s-volume
configMap:
name: hivemr3-conf-configmap
- name: key-k8s-volume
secret:
secretName: hivemr3-keytab-secret
The spec/template/spec/hostAliases
field can list aliases for hosts that may not be found in the default DNS.
For example, the host running Metastore may be unknown to the default DNS, in which case the user can add an alias for it.
Using Kerberos-based authentication
In order to use Kerberos-based authentication:
the configuration key hadoop.security.authentication
should be set to kerberos
in kubernetes/conf/core-site.xml
.
$ vi kubernetes/conf/core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
The use of Kerberos-based authentication has an implication that in kubernetes/env.sh
,
the service principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL
should match the user in DOCKER_USER
.
For example,
root/mr3@PL
is a valid Kerberos principal for HIVE_SERVER2_KERBEROS_PRINCIPAL
because DOCKER_USER
is set to root
by default.
The two variables must match for two reasons.
- DAGAppMaster checks whether or not HiveServer2 has the right permission by comparing 1) the user of DAGAppMaster which is specified in
DOCKER_USER
and 2) the user of HiveServer2 which is the principal name inHIVE_SERVER2_KERBEROS_PRINCIPAL
. DAGAppMaster assumes the user inDOCKER_USER
becausekubernetes/hive/mr3/mr3-setup.sh
sets the configuration keymr3.k8s.pod.master.user
to the user inDOCKER_USER
.-Dmr3.k8s.pod.master.user=$DOCKER_USER -Dmr3.k8s.master.working.dir=$REMOTE_WORK_DIR \
The user can disable permission checking in DAGAppMaster by setting
mr3.am.acls.enabled
to false inkubernetes/conf/mr3-site.xml
. Since DAGAppMaster does not expose its address to the outside, the security of HiveServer2 itself is not compromised. - Shuffle handlers in ContainerWorkers compare the service principal name against the owner of intermediate files,
which is the user specified in
kubernetes/hive/Dockerfile
, which, in turn, should matchDOCKER_USER
inkubernetes/env.sh
.
A mismatch between DOCKER_USER
and HIVE_SERVER2_KERBEROS_PRINCIPAL
makes HiveServer2 unable to establish a connection to DAGAppMaster.
In such a case, DAGAppMaster keeps printing error messages like:
2019-07-04T09:42:17,074 WARN [IPC Server handler 0 on 8080] ipc.Server: IPC Server handler 0 on 8080, call Call#32 Retry#0 com.datamonad.mr3.master.DAGClientHandlerProtocolBlocking.getSessionStatus from 10.43.0.0:37962
java.security.AccessControlException: User gitlab-runner/indigo20@RED (auth:TOKEN) cannot perform AM view operations
at com.datamonad.mr3.master.DAGClientHandlerProtocolServer.checkAccess(DAGClientHandlerProtocolServer.scala:239) ~[mr3-tez-0.1-assembly.jar:0.1]
at com.datamonad.mr3.master.DAGClientHandlerProtocolServer.checkViewAccess(DAGClientHandlerProtocolServer.scala:233) ~[mr3-tez-0.1-assembly.jar:0.1]
...
If permission checking is disabled in DAGAppMaster, ContainerWorkers print error messages like:
2020-08-16T16:34:01,019 ERROR [Tez Shuffle Handler Worker #1] shufflehandler.ShuffleHandler: Shuffle error :
java.io.IOException: Owner 'root' for path /data1/k8s/dag_1/container_K@1/vertex_3/attempt_70888998_0000_1_03_000000_0_10003/file.out did not match expected owner 'hive'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:281) ~[hadoop-common-3.1.2.jar:?]
at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenForRandomRead(SecureIOUtils.java:128) ~[hadoop-common-3.1.2.jar:?]
at org.apache.hadoop.io.SecureIOUtils.openForRandomRead(SecureIOUtils.java:113) ~[hadoop-common-3.1.2.jar:?]
at com.datamonad.mr3.tez.shufflehandler.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:1129) ~[mr3-tez-1.0-assembly.jar:1.0]
Running HiveServer2
In order to run HiveServer2, the user can execute the script kubernetes/run-hive.sh
:
$ kubernetes/run-hive.sh
...
CLIENT_TO_AM_TOKEN_KEY=668123ae-de9d-4ca3-95a6-a5848e123e6e
MR3_APPLICATION_ID_TIMESTAMP=10403
MR3_SHARED_SESSION_ID=f214e200-f38e-4b94-89d5-e0245de3dea5
ATS_SECRET_KEY=0d2fdeec-c564-4d40-891a-5ca5f736294c
configmap/client-am-config created
deployment/hivemr3-hiveserver2 created
service/hiveserver2 created
The script mounts the following files inside the HiveServer2 Pod:
kubernetes/env.sh
kubernetes/conf/*
kubernetes/key/*
In this way, the user can completely specify the behavior of HiveServer2 as well as DAGAppMaster and ContainerWorkers.
For logging configuration, HiveServer2 reads kubernetes/conf/hive-log4j2.properties
.
DAGAppMaster and ContainerWorkers read k8s-mr3-container-log4j2.properties
which is included in the MR3 release.
By default, logging messages are redirected to console.
Executing the script kubernetes/run-hive.sh
starts a HiveServer2 Pod and a DAGAppMaster Pod.
It may take a while for the two Pods to become ready because both Pods run readiness and liveness probes.
The HiveServer2 Pod becomes ready when it opens a Thrift port and starts accepting connection requests from Beeline.
The DAGAppMaster Pod becomes ready when it opens an RPC port and starts accepting connection requests from MR3Client.
The HiveServer2 Pod becomes ready only after the DAGAppMaster Pod becomes ready.
$ kubectl get pods -n hivemr3
NAME READY STATUS RESTARTS AGE
hivemr3-hiveserver2-lmngh 1/1 Running 0 41s
mr3master-6196-0-dwnck 1/1 Running 0 30s
The user can verify that all files are successfully mounted inside the HiveServer2 Pod:
$ kubectl exec -n hivemr3 -it hivemr3-hiveserver2-lmngh -- /bin/bash
bash-4.2$ pwd
/opt/mr3-run/hive
bash-4.2$ cd /opt/mr3-run/
bash-4.2$ ls env.sh
env.sh
bash-4.2$ ls conf/
core-site.xml hive-log4j2.properties.console jgss.conf mr3-site.xml ranger-policymgr-ssl.xml
hive-log4j.properties hive-log4j2.properties.file krb5.conf ranger-hive-audit.xml tez-site.xml
hive-log4j2.properties hive-site.xml mapred-site.xml ranger-hive-security.xml yarn-site.xml
bash-4.2$ ls key/
hive.service.keytab
The user can start a new Beeline connection using the address and service principal name of HiveServer2 (e.g., beeline -u "jdbc:hive2://10.1.91.41:9852/;principal=hive/indigo20@RED;"
).
After accepting queries from Beeline connections, DAGAppMaster creates many ContainerWorker Pods each of which runs a ContainerWorker container.
$ kubectl get pods -n hivemr3
NAME READY STATUS RESTARTS AGE
hivemr3-hiveserver2-lmngh 1/1 Running 0 4m2s
mr3master-6196-0-dwnck 1/1 Running 0 3m51s
mr3worker-14e3-1 1/1 Running 0 17s
mr3worker-14e3-2 1/1 Running 0 11s
mr3worker-14e3-3 0/1 Init:0/1 0 5s
mr3worker-14e3-4 0/1 Init:0/1 0 5s
Suppressing TSaslTransportException
While HiveServer2 is running, its log may repeatedly print ERROR messages due to org.apache.thrift.transport.TSaslTransportException
.
2020-07-07T18:24:14,516 ERROR [HiveServer2-Handler-Pool: Thread-39] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
...
Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
...
This message is printed when the liveness probe checks the Thrift port, so it is not an error.
Stopping HiveServer2
In order to stop HiveServer2, the user can delete Deployment for HiveServer2.
$ kubectl get -n hivemr3 deployments
NAME DESIRED CURRENT READY AGE
hivemr3-hiveserver2 1 1 1 10m
mr3master-6196-0 1 1 1 10m
$ kubectl -n hivemr3 delete deployment hivemr3-hiveserver2
deployment "hivemr3-hiveserver2" deleted
Deleting Deployment for HiveServer2 does not automatically terminate the DAGAppMaster Pod.
This is a feature, not a bug, which is due to the support of high availability in Hive on MR3.
(After setting environment variable MR3_APPLICATION_ID_TIMESTAMP
properly,
running run-hive.sh
attaches the existing DAGAppMaster Pod to the new HiveServer2 Pod.)
Stopping DAGAppMaster
Deleting DAGAppMaster Pod automatically deletes all ContainerWorker Pods as well,
but another DAGAppMaster Pod is created shortly because we use Deployment for DAGAppMaster.
In the following example, mr3master-6196-0-6qd4m
is the second DAGAppMaster Pod
which is created after deleting the initial DAGAppMaster Pod mr3master-6196-0-dwnck
.
$ kubectl delete pod -n hivemr3 mr3master-6196-0-dwnck
pod "mr3master-6196-0-dwnck" deleted
$ kubectl get pods -n hivemr3
NAME READY STATUS RESTARTS AGE
hivemr3-hiveserver2-lmngh 1/1 Running 0 9m10s
mr3master-6196-0-6qd4m 1/1 Running 0 47s
In order to stop DAGAppMaster, the user can delete Deployment for DAGAppMaster.
$ kubectl -n hivemr3 delete deployment mr3master-6196-0
deployment "mr3master-6196-0" deleted
After a while, no Pods should be running in the namespace hivemr3
.
To delete all remaining resources, execute the following command:
$ kubectl -n hivemr3 delete configmap --all; kubectl -n hivemr3 delete svc --all; kubectl -n hivemr3 delete secret --all; kubectl -n hivemr3 delete serviceaccount hive-service-account; kubectl -n hivemr3 delete role --all; kubectl -n hivemr3 delete rolebinding --all; kubectl delete clusterrole node-reader; kubectl delete clusterrolebinding hive-clusterrole-binding; kubectl -n hivemr3 delete persistentvolumeclaims workdir-pvc; kubectl delete persistentvolumes workdir-pv
Setting hive.server2.enable.doAs
to true
With hive.server2.enable.doAs
set to true in hive-site.xml
,
the user should allow user root
to impersonate potential clients by extending core-site.xml
on the node where the Yarn ResourceManager is running, not kubernetes/conf/core-site.xml
,
where we assume that the user in the service principal name for HiveServer2 is root
.
For example, in order to accept queries from user foo
,
we could extend core-site.xml
as follows:
hadoop.proxyuser.root.groups = foo
hadoop.proxyuser.root.hosts = indigo20
Here Metastore is running on node indigo20
(where impersonating user foo
actually takes place).
Setting the time for waiting when recovering from a DAGAppMaster failure
If a DAGAppMaster Pod fails and the user submits a new query, HiveServer2 tries to connnect to the non-existent DAGAppMaster at least twice and up to three times:
- to acknowledge the completion of previous queries, if any;
- to get an estimate number of Tasks for the new query;
- to get the current status of DAGAppMaster.
For each case, HiveServer2 makes as many attempts as specified by the configuration key ipc.client.connect.max.retries.on.timeouts
in kubernetes/conf/core-site.xml
while each attempt takes 20 seconds.
By default, ipc.client.connect.max.retries.on.timeouts
is set to 45,
so HiveServer2 may spend a long time recovering from a DAGAppMaster failure (e.g., 45 * 20 seconds * 3 times).
Hence, the user may want to set ipc.client.connect.max.retries.on.timeouts
to a small number (e.g., 3)
so that HiveServer2 can quickly recover from a DAGAppMaster failure.