Configuring HiveServer2 involves two steps: 1) specifying parameters for connecting from HiveServer2 to Metastore; 2) specifying parameters for connecting from clients to HiveServer2 itself.

The following code shows part of kubernetes/env.sh for specifying parameters for connecting to Metastore:

HIVE_DATABASE_HOST=red0
HIVE_METASTORE_HOST=red0
HIVE_METASTORE_PORT=9850
HIVE_DATABASE_NAME=hive5mr3

HIVE_WAREHOUSE_DIR=/opt/mr3-run/work-dir/warehouse/

METASTORE_SECURE_MODE=true
HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/red0@RED
HIVE_METASTORE_KERBEROS_KEYTAB=$KEYTAB_MOUNT_DIR/hive.service.keytab
  • HIVE_DATABASE_HOST specifies the host where the database for Metastore is running.
  • HIVE_METASTORE_HOST and HIVE_METASTORE_PORT specify the address of Metastore itself.
  • HIVE_WAREHOUSE_DIR specifies the path to the Hive warehouse. Since MR3 is agnostic to the type of data sources, it is important to specify the full path to the warehouse, including the file system. If no file system is given, MR3 assumes the local file system because the configuration key fs.defaultFS is set to file:/// in kubernetes/conf/core-site.xml. Below are a few examples of the path. For running Hive on MR3 in a Kubernetes cluster, the user should use either hdfs or s3a for the file system.
    • /opt/mr3-run/work-dir/warehouse/: a local directory inside the HiveServer2 Pod is used for the Hive warehouse. Since the local directory is not visible to the outside, this works only if all the components (HiveServer2, DAGAppMaster, and ContainerWorkers) run in the same Pod.
    • hdfs://red0:8020/user/hive/warehouse: an HDFS directory with NameNode on red0 is used for the Hive warehouse.
    • s3a://mr3-bucket/warehouse: an S3 bucket is used for the Hive warehouse.
  • If Metastore runs in a secure mode, METASTORE_SECURE_MODE should be set to true. HIVE_METASTORE_KERBEROS_PRINCIPAL specifies the service principal name, and HIVE_METASTORE_KERBEROS_KEYTAB specifies the name of the service keytab file which should be copied to the directory kubernetes/key by the user. Note that if HiveServer2 uses Kerberos-based authentication, METASTORE_SECURE_MODE should also be set to true.

If HIVE_DATABASE_HOST and HIVE_METASTORE_HOST use hosts unknown to the default DNS, the user should add their aliases in the spec/template/spec/hostAliases of kubernetes/yaml/hive.yaml. The following example adds a host name red0 that is unknown to the default DNS.

spec:
  template:
    spec:
      hostAliases:
      - ip: "10.1.91.4"
        hostnames:
        - "red0"

The following code shows part of kubernetes/env.sh for specifying parameters for connecting to HiveServer2:

HIVE_SERVER2_HOST=$HOSTNAME
HIVE_SERVER2_PORT=9852
HIVE_SERVER2_HTTP_PORT=10001

HIVE_SERVER2_HEAPSIZE=32768

HIVE_SERVER2_AUTHENTICATION=KERBEROS
HIVE_SERVER2_KERBEROS_PRINCIPAL=hive/red0@RED
HIVE_SERVER2_KERBEROS_KEYTAB=$KEYTAB_MOUNT_DIR/hive.service.keytab

TOKEN_RENEWAL_HIVE_ENABLED=false
  • HIVE_SERVER2_PORT and HIVE_SERVER2_HTTP_PORT should match the port numbers specified in hiveserver2-service.yaml.

  • HIVE_SERVER2_HEAPSIZE specifies the heap size (in MB) for HiveServer2. If DAGAppMaster runs in LocalThread mode, the heap size should be no larger than the memory allocated to the Pod for running HiveServer2 (specified in hive.yaml). If DAGAppMaster runs in LocalProcess mode, the sum with the heap size of DAGAppMaster (specified by mr3.am.resource.memory.mb in conf/mr3-site.xml) should be no larger than the memory allocated to the Pod.

  • If HiveServer2 uses Kerberos-based authentication, HIVE_SERVER2_KERBEROS_PRINCIPAL and HIVE_SERVER2_KERBEROS_KEYTAB should specify the service principal name and the service keytab file (for hive.server2.authentication.kerberos.principal and hive.server2.authentication.kerberos.keytab in hive-site.xml), respectively. Note that this service principal name may be different from the name in HIVE_METASTORE_KERBEROS_PRINCIPAL, and the service keytab file may be different from the file in HIVE_METASTORE_KERBEROS_KEYTAB.

  • TOKEN_RENEWAL_HIVE_ENABLED should be set to true in order to automatically renew Hive tokens.

Setting the service principal name

By default, the configuration key hadoop.security.authentication is set to kerberos in kubernetes/conf/core-site.xml, and both HiveServer2 and DAGAppMaster use Kerberos-based authentication:

<property>
  <name>hadoop.security.authentication</name>
  <value>kerberos</value>
</property>

The use of Kerberos-based authentication has an implication that in kubernetes/env.sh, the service principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL should match the user in DOCKER_USER. For example, root/mr3@PL is a valid Kerberos principal for HIVE_SERVER2_KERBEROS_PRINCIPAL because DOCKER_USER is set to root by default. The two variables must match for two reasons.

  • DAGAppMaster checks whether or not HiveServer2 has the right permission by comparing 1) the user of DAGAppMaster which is specified in DOCKER_USER and 2) the user of HiveServer2 which is the principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL. DAGAppMaster assumes the user in DOCKER_USER because kubernetes/hive/mr3/mr3-setup.sh sets the configuration key mr3.k8s.pod.master.user to the user in DOCKER_USER.
    -Dmr3.k8s.pod.master.user=$DOCKER_USER -Dmr3.k8s.master.working.dir=$REMOTE_WORK_DIR \
    

    The user can disable permission checking in DAGAppMaster by setting mr3.am.acls.enabled to false in kubernetes/conf/mr3-site.xml. Since DAGAppMaster does not expose its address to the outside, the security of HiveServer2 itself is not compromised.

  • Shuffle handlers in ContainerWorkers compare the service principal name against the owner of intermediate files, which is the user specified in kubernetes/hive/Dockerfile, which, in turn, should match DOCKER_USER in kubernetes/env.sh.

A mismatch between DOCKER_USER and HIVE_SERVER2_KERBEROS_PRINCIPAL makes HiveServer2 unable to establish a connection to DAGAppMaster. In such a case, DAGAppMaster keeps printing error messages like:

2019-07-04T09:42:17,074  WARN [IPC Server handler 0 on 8080] ipc.Server: IPC Server handler 0 on 8080, call Call#32 Retry#0 com.datamonad.mr3.master.DAGClientHandlerProtocolBlocking.getSessionStatus from 10.43.0.0:37962
java.security.AccessControlException: User gitlab-runner/red0@RED (auth:TOKEN) cannot perform AM view operations
  at com.datamonad.mr3.master.DAGClientHandlerProtocolServer.checkAccess(DAGClientHandlerProtocolServer.scala:239) ~[mr3-tez-0.1-assembly.jar:0.1]
  at com.datamonad.mr3.master.DAGClientHandlerProtocolServer.checkViewAccess(DAGClientHandlerProtocolServer.scala:233) ~[mr3-tez-0.1-assembly.jar:0.1]
  ...

If permission checking is disabled in DAGAppMaster, ContainerWorkers print error messages like:

2020-08-16T16:34:01,019 ERROR [Tez Shuffle Handler Worker #1] shufflehandler.ShuffleHandler: Shuffle error :
java.io.IOException: Owner 'root' for path /data1/k8s/dag_1/container_K@1/vertex_3/attempt_70888998_0000_1_03_000000_0_10003/file.out did not match expected owner 'hive'
  at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:281) ~[hadoop-common-3.1.2.jar:?]
  at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenForRandomRead(SecureIOUtils.java:128) ~[hadoop-common-3.1.2.jar:?]
  at org.apache.hadoop.io.SecureIOUtils.openForRandomRead(SecureIOUtils.java:113) ~[hadoop-common-3.1.2.jar:?]
  at com.datamonad.mr3.tez.shufflehandler.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:1129) ~[mr3-tez-1.0-assembly.jar:1.0]

Setting the time for waiting when recovering from a DAGAppMaster failure

If a DAGAppMaster Pod fails and the user submits a new query, HiveServer2 tries to connnect to the non-existent DAGAppMaster at least twice and up to three times:

  1. to acknowledge the completion of previous queries, if any;
  2. to get an estimate number of Tasks for the new query;
  3. to get the current status of DAGAppMaster.

For each case, HiveServer2 makes as many attempts as specified by the configuration key ipc.client.connect.max.retries.on.timeouts in kubernetes/conf/core-site.xml while each attempt takes 20 seconds. By default, ipc.client.connect.max.retries.on.timeouts is set to 45, so HiveServer2 may spend a long time recovering from a DAGAppMaster failure (e.g., 45 * 20 seconds * 3 times). Hence, the user may want to set ipc.client.connect.max.retries.on.timeouts to a small number (e.g., 3) so that HiveServer2 can quickly recover from a DAGAppMaster failure.