Running HiveServer2 |

For running HiveServer2, Hive on MR3 can create three kinds of Pods: HiveServer2 Pod, DAGAppMaster Pod, and ContainerWorker Pod.

A HiveServer2 Pod runs a HiveServer2 container and optionally another container for DAGAppMaster in LocalProcess mode. The user creates a HiveServer2 Pod by executing the script kubernetes/run-hive.sh.
A DAGAppMaster Pod is created by HiveServer2 when DAGAppMaster is configured to run in Kubernetes mode (i.e., mr3.master.mode is set to kubernetes in mr3-site.xml).
A ContainerWorker Pod runs a ContainerWorker container and is created by DAGAppMaster at runtime.

Configuring the HiveServer2 Pod

The file kubernetes/yaml/hive.yaml creates a Pod for running HiveServer2 (by creating a Deployment). The user should update several sections in this file according to Kubernetes cluster settings.

In the spec/template/spec/containers section:

The image field should match the Docker image specified by DOCKER_HIVE_IMG in kubernetes/env.sh.
The args field specifies the DAGAppMaster mode: --localthread for LocalThread mode, --localprocess for LocalProcess mode, and --kubernetes for Kubernetes mode.
The resources/requests and resources/limits fields specify the resources to to be allocated to a HiveServer2 Pod.
The three fields ports/containerPort, readinessProbe/tcpSocket/port, and livenessProbe/tcpSocket/port should match the port number specified in hiveserver2-service.yaml.

$ vi kubernetes/yaml/hive.yaml

spec:
  template:
    spec:
      containers:
      - image: 10.1.91.17:5000/hive3
        args: ["start", "--kubernetes"]
        resources:
          requests:
            cpu: 4
            memory: 32Gi
          limits:
            cpu: 4
            memory: 32Gi
        ports:
        - containerPort: 9852
        readinessProbe: 
          tcpSocket: 
            port: 9852 
        livenessProbe: 
          tcpSocket: 
            port: 9852

In the spec/template/spec/volumes section:

The configMap/name field under conf-k8s-volume should match the name specified by CONF_DIR_CONFIGMAP in kubernetes/env.sh.
The secret/secretName field under key-k8s-volume should match the name specified by KEYTAB_SECRET in kubernetes/env.sh.

$ vi kubernetes/yaml/hive.yaml

spec:
  template:
    spec:
      volumes:
      - name: conf-k8s-volume
        configMap:
          name: hivemr3-conf-configmap
      - name: key-k8s-volume
        secret:
          secretName: hivemr3-keytab-secret

The spec/template/spec/hostAliases field can list aliases for hosts that may not be found in the default DNS. For example, the host running Metastore may be unknown to the default DNS, in which case the user can add an alias for it.

Using Kerberos-based authentication

In order to use Kerberos-based authentication: the configuration key hadoop.security.authentication should be set to kerberos in kubernetes/conf/core-site.xml.

$ vi kubernetes/conf/core-site.xml

<property>
  <name>hadoop.security.authentication</name>
  <value>kerberos</value>
</property>

The use of Kerberos-based authentication has an implication that in kubernetes/env.sh, the service principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL should match the user in DOCKER_USER. For example, root/mr3@PL is a valid Kerberos principal for HIVE_SERVER2_KERBEROS_PRINCIPAL because DOCKER_USER is set to root by default. The two variables must match for two reasons.

DAGAppMaster checks whether or not HiveServer2 has the right permission by comparing 1) the user of DAGAppMaster which is specified in DOCKER_USER and 2) the user of HiveServer2 which is the principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL. DAGAppMaster assumes the user in DOCKER_USER because kubernetes/hive/mr3/mr3-setup.sh sets the configuration key mr3.k8s.pod.master.user to the user in DOCKER_USER.
```
-Dmr3.k8s.pod.master.user=$DOCKER_USER -Dmr3.k8s.master.working.dir=$REMOTE_WORK_DIR \
```
The user can disable permission checking in DAGAppMaster by setting mr3.am.acls.enabled to false in kubernetes/conf/mr3-site.xml. Since DAGAppMaster does not expose its address to the outside, the security of HiveServer2 itself is not compromised.
Shuffle handlers in ContainerWorkers compare the service principal name against the owner of intermediate files, which is the user specified in kubernetes/hive/Dockerfile, which, in turn, should match DOCKER_USER in kubernetes/env.sh.

A mismatch between DOCKER_USER and HIVE_SERVER2_KERBEROS_PRINCIPAL makes HiveServer2 unable to establish a connection to DAGAppMaster. In such a case, DAGAppMaster keeps printing error messages like:

2019-07-04T09:42:17,074  WARN [IPC Server handler 0 on 8080] ipc.Server: IPC Server handler 0 on 8080, call Call#32 Retry#0 com.datamonad.mr3.master.DAGClientHandlerProtocolBlocking.getSessionStatus from 10.43.0.0:37962
java.security.AccessControlException: User gitlab-runner/indigo20@RED (auth:TOKEN) cannot perform AM view operations
  at com.datamonad.mr3.master.DAGClientHandlerProtocolServer.checkAccess(DAGClientHandlerProtocolServer.scala:239) ~[mr3-tez-0.1-assembly.jar:0.1]
  at com.datamonad.mr3.master.DAGClientHandlerProtocolServer.checkViewAccess(DAGClientHandlerProtocolServer.scala:233) ~[mr3-tez-0.1-assembly.jar:0.1]
  ...

If permission checking is disabled in DAGAppMaster, ContainerWorkers print error messages like:

2020-08-16T16:34:01,019 ERROR [Tez Shuffle Handler Worker #1] shufflehandler.ShuffleHandler: Shuffle error :
java.io.IOException: Owner 'root' for path /data1/k8s/dag_1/container_K@1/vertex_3/attempt_70888998_0000_1_03_000000_0_10003/file.out did not match expected owner 'hive'
  at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:281) ~[hadoop-common-3.1.2.jar:?]
  at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenForRandomRead(SecureIOUtils.java:128) ~[hadoop-common-3.1.2.jar:?]
  at org.apache.hadoop.io.SecureIOUtils.openForRandomRead(SecureIOUtils.java:113) ~[hadoop-common-3.1.2.jar:?]
  at com.datamonad.mr3.tez.shufflehandler.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:1129) ~[mr3-tez-1.0-assembly.jar:1.0]

Running HiveServer2

In order to run HiveServer2, the user can execute the script kubernetes/run-hive.sh:

$ kubernetes/run-hive.sh
...
CLIENT_TO_AM_TOKEN_KEY=668123ae-de9d-4ca3-95a6-a5848e123e6e
MR3_APPLICATION_ID_TIMESTAMP=10403
MR3_SHARED_SESSION_ID=f214e200-f38e-4b94-89d5-e0245de3dea5
ATS_SECRET_KEY=0d2fdeec-c564-4d40-891a-5ca5f736294c
configmap/client-am-config created
deployment/hivemr3-hiveserver2 created
service/hiveserver2 created

The script mounts the following files inside the HiveServer2 Pod:

kubernetes/env.sh
kubernetes/conf/*
kubernetes/key/*

In this way, the user can completely specify the behavior of HiveServer2 as well as DAGAppMaster and ContainerWorkers. For logging configuration, HiveServer2 reads kubernetes/conf/hive-log4j2.properties. DAGAppMaster and ContainerWorkers read k8s-mr3-container-log4j2.properties which is included in the MR3 release. By default, logging messages are redirected to console.

Executing the script kubernetes/run-hive.sh starts a HiveServer2 Pod and a DAGAppMaster Pod. It may take a while for the two Pods to become ready because both Pods run readiness and liveness probes. The HiveServer2 Pod becomes ready when it opens a Thrift port and starts accepting connection requests from Beeline. The DAGAppMaster Pod becomes ready when it opens an RPC port and starts accepting connection requests from MR3Client. The HiveServer2 Pod becomes ready only after the DAGAppMaster Pod becomes ready.

$ kubectl get pods -n hivemr3
NAME                        READY   STATUS    RESTARTS   AGE
hivemr3-hiveserver2-lmngh   1/1     Running   0          41s
mr3master-6196-0-dwnck      1/1     Running   0          30s

The user can verify that all files are successfully mounted inside the HiveServer2 Pod:

$ kubectl exec -n hivemr3 -it hivemr3-hiveserver2-lmngh -- /bin/bash
bash-4.2$ pwd
/opt/mr3-run/hive
bash-4.2$ cd /opt/mr3-run/
bash-4.2$ ls env.sh
env.sh
bash-4.2$ ls conf/
core-site.xml           hive-log4j2.properties.console  jgss.conf        mr3-site.xml              ranger-policymgr-ssl.xml
hive-log4j.properties   hive-log4j2.properties.file     krb5.conf        ranger-hive-audit.xml     tez-site.xml
hive-log4j2.properties  hive-site.xml                   mapred-site.xml  ranger-hive-security.xml  yarn-site.xml
bash-4.2$ ls key/
hive.service.keytab

The user can start a new Beeline connection using the address and service principal name of HiveServer2 (e.g., beeline -u "jdbc:hive2://10.1.91.41:9852/;principal=hive/indigo20@RED;"). After accepting queries from Beeline connections, DAGAppMaster creates many ContainerWorker Pods each of which runs a ContainerWorker container.

$ kubectl get pods -n hivemr3
NAME                        READY   STATUS     RESTARTS   AGE
hivemr3-hiveserver2-lmngh   1/1     Running    0          4m2s
mr3master-6196-0-dwnck      1/1     Running    0          3m51s
mr3worker-14e3-1            1/1     Running    0          17s
mr3worker-14e3-2            1/1     Running    0          11s
mr3worker-14e3-3            0/1     Init:0/1   0          5s
mr3worker-14e3-4            0/1     Init:0/1   0          5s

Suppressing `TSaslTransportException`

While HiveServer2 is running, its log may repeatedly print ERROR messages due to org.apache.thrift.transport.TSaslTransportException.

2020-07-07T18:24:14,516 ERROR [HiveServer2-Handler-Pool: Thread-39] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
...
Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
...

This message is printed when the liveness probe checks the Thrift port, so it is not an error.

Stopping HiveServer2

In order to stop HiveServer2, the user can delete Deployment for HiveServer2.

$ kubectl get -n hivemr3 deployments
NAME                  DESIRED   CURRENT   READY   AGE
hivemr3-hiveserver2   1         1         1       10m
mr3master-6196-0      1         1         1       10m
$ kubectl -n hivemr3 delete deployment hivemr3-hiveserver2
deployment "hivemr3-hiveserver2" deleted

Deleting Deployment for HiveServer2 does not automatically terminate the DAGAppMaster Pod. This is a feature, not a bug, which is due to the support of high availability in Hive on MR3. (After setting environment variable MR3_APPLICATION_ID_TIMESTAMP properly, running run-hive.sh attaches the existing DAGAppMaster Pod to the new HiveServer2 Pod.)

Stopping DAGAppMaster

Deleting DAGAppMaster Pod automatically deletes all ContainerWorker Pods as well, but another DAGAppMaster Pod is created shortly because we use Deployment for DAGAppMaster. In the following example, mr3master-6196-0-6qd4m is the second DAGAppMaster Pod which is created after deleting the initial DAGAppMaster Pod mr3master-6196-0-dwnck.

$ kubectl delete pod -n hivemr3 mr3master-6196-0-dwnck
pod "mr3master-6196-0-dwnck" deleted
$ kubectl get pods -n hivemr3
NAME                        READY   STATUS    RESTARTS   AGE
hivemr3-hiveserver2-lmngh   1/1     Running   0          9m10s
mr3master-6196-0-6qd4m      1/1     Running   0          47s

In order to stop DAGAppMaster, the user can delete Deployment for DAGAppMaster.

$ kubectl -n hivemr3 delete deployment mr3master-6196-0
deployment "mr3master-6196-0" deleted

After a while, no Pods should be running in the namespace hivemr3. To delete all remaining resources, execute the following command:

$ kubectl -n hivemr3 delete configmap --all; kubectl -n hivemr3 delete svc --all; kubectl -n hivemr3 delete secret --all; kubectl -n hivemr3 delete serviceaccount hive-service-account; kubectl -n hivemr3 delete role --all; kubectl -n hivemr3 delete rolebinding --all; kubectl delete clusterrole node-reader; kubectl delete clusterrolebinding hive-clusterrole-binding; kubectl -n hivemr3 delete persistentvolumeclaims workdir-pvc; kubectl delete persistentvolumes workdir-pv

Setting `hive.server2.enable.doAs` to true

With hive.server2.enable.doAs set to true in hive-site.xml, the user should allow user root to impersonate potential clients by extending core-site.xml on the node where the Yarn ResourceManager is running, not kubernetes/conf/core-site.xml, where we assume that the user in the service principal name for HiveServer2 is root. For example, in order to accept queries from user foo, we could extend core-site.xml as follows:

hadoop.proxyuser.root.groups = foo
hadoop.proxyuser.root.hosts = indigo20

Here Metastore is running on node indigo20 (where impersonating user foo actually takes place).

Setting the time for waiting when recovering from a DAGAppMaster failure

If a DAGAppMaster Pod fails and the user submits a new query, HiveServer2 tries to connnect to the non-existent DAGAppMaster at least twice and up to three times:

to acknowledge the completion of previous queries, if any;
to get an estimate number of Tasks for the new query;
to get the current status of DAGAppMaster.

For each case, HiveServer2 makes as many attempts as specified by the configuration key ipc.client.connect.max.retries.on.timeouts in kubernetes/conf/core-site.xml while each attempt takes 20 seconds. By default, ipc.client.connect.max.retries.on.timeouts is set to 45, so HiveServer2 may spend a long time recovering from a DAGAppMaster failure (e.g., 45 * 20 seconds * 3 times). Hence, the user may want to set ipc.client.connect.max.retries.on.timeouts to a small number (e.g., 3) so that HiveServer2 can quickly recover from a DAGAppMaster failure.