For running HiveServer2, Hive on MR3 can create three kinds of Pods: HiveServer2 Pod, DAGAppMaster Pod, and ContainerWorker Pod. A HiveServer2 Pod runs a HiveServer2 container and optionally another container for DAGAppMaster in LocalProcess mode. The user creates a HiveServer2 Pod by executing the script kubernetes/run-hive.sh. A DAGAppMaster Pod is created by HiveServer2 when DAGAppMaster is configured to run in Kubernetes mode (i.e., mr3.master.mode is set to kubernetes in mr3-site.xml). A ContainerWorker Pod runs a ContainerWorker container and is created by DAGAppMaster at runtime.

Before creating Pods, the user should configure Kubernetes objects such as namespace, Roles, RoleBindings, ServiceAccount, PersistentVolume, PersistentVolumeClaim, and so on. The following files specify how to configure Kubernetes objects for HiveServer2:

└── kubernetes
    ├── env.sh
    └── yaml
        ├── cluster-role.yaml
        ├── hive-role.yaml
        ├── hiveserver2-service.yaml
        ├── hive-service-account.yaml
        ├── hive.yaml
        ├── namespace.yaml
        ├── workdir-pvc.yaml
        └── workdir-pv.yaml

The user should also update the configuration files inherited from Hive on MR3 on Hadoop. For HiveServer2, Metastore, DAGAppMaster, and ContainerWorker, these configuration files are found in the directory kubernetes/conf:

└── kubernetes
    └── conf
        ├── core-site.xml
        ├── hive-log4j2.properties
        ├── hive-log4j.properties
        ├── hive-site.xml
        ├── jgss.conf
        ├── krb5.conf
        ├── mapred-site.xml
        ├── mr3-site.xml
        ├── ranger-hive-audit.xml
        ├── ranger-hive-security.xml
        ├── ranger-policymgr-ssl.xml
        ├── tez-site.xml
        └── yarn-site.xml

For most of the configuration keys, the user can set their values in the same way as in Hive on MR3 on Hadoop. Those configuration keys specific to Hive on MR3 on Kubernetes are explained later.

Updating kubernetes/env.sh

The user should set the following environment variables in kubernetes/env.sh.

MR3_NAMESPACE=hivemr3
MR3_SERVICE_ACCOUNT=hive-service-account
MASTER_SERVICE_ACCOUNT=master-service-account
WORKER_SERVICE_ACCOUNT=worker-service-account
CONF_DIR_CONFIGMAP=hivemr3-conf-configmap

CREATE_KEYTAB_SECRET=true   
KEYTAB_SECRET=hivemr3-keytab-secret
  • MR3_NAMESPACE specifies the namespace for all Kubernetes objects.
  • MR3_SERVICE_ACCOUNT specifies the ServiceAccount for Hive on MR3.
  • MASTER_SERVICE_ACCOUNT specifies the ServiceAccount for MR3 DAGAppMaster.
  • WORKER_SERVICE_ACCOUNT specifies the ServiceAccount for MR3 ContainerWorkers.
  • CONF_DIR_CONFIGMAP specifies the name of the ConfigMap to be built from files in the directory kubernetes/conf.
  • CREATE_KEYTAB_SECRET specifies whether or not to create a Secret from files in the directory kubernetes/key. It should be set to true if Kerberos is used for authentication.
  • KEYTAB_SECRET specifies the name of the Secret to be built when CREATE_KEYTAB_SECRET is set to true.

Then the user should manually update yaml files in accordance with kubernetes/env.sh as well as Kubernetes cluster settings. Since yaml files do not read environment variables, the user should manually update these files. Below we show how to update yaml files. In most case, the default values are all okay to use, except for those configurations specific to each Kubernetes cluster. We use boldface where it is mandatory to specify new values to override default values.

Updating YAML files

namespace.yaml

This file creates a namespace for all Kubernetes objects. The name field should match the namespace specified in MR3_NAMESPACE in kubernetes/env.sh.

  name: hivemr3

Similarly the namespace field in following yaml files should also match the same namespace: hive-service-account.yaml, hive-role.yaml, hiveserver2-service.yaml, hive.yaml, workdir-pvc.yaml.

hive-service-account.yaml

This file creates a ServiceAccount. The name of the ServiceAccount object (hive-service-account) is read in run-hive.sh, so there is no need to update this file.

cluster-role.yaml

This file creates a ClusterRole. The name of the ClusterRole resource (node-reader) is read in run-hive.sh, so there is no need to update this file.

hive-role.yaml

This file creates a Role for HiveServer2 Pods. The name of the Role resource (hive-role) is read in run-hive.sh, so there is no need to update this file.

workdir-pv.yaml

This file creates a PersistentVolume for copying the result of running a query from ContainerWorkers to HiveServer2. The sample file in the MR3 release uses an NFS volume, and the user should update it in order to use different types of PersistentVolumes.

workdir-pvc.yaml

This file creates a PersistentVolumeClaim which references the PersistentVolume created by workdir-pv.yaml. The user should specify the size of the storage:

      storage: 10Gi

hiveserver2-service.yaml

This file creates a Service for exposing HiveServer2 to the outside of the Kubernetes cluster. The user should specify a public IP address with a valid host name and a port number (with name thrift) for HiveServer2 so that clients can connect to it from the outside of the Kubernetes cluster. Another port number with name http should be specified if HTTP transport is enabled (by setting the configuration key hive.server2.transport.mode to all or http in kubernetes/conf/hive-site.xml). The host name is necessary in order for Ranger to securely communicate with HiveServer2.

  ports:
  - protocol: TCP
    port: 9852
    targetPort: 9852
    name: thrift
  - protocol: TCP
    port: 10001
    targetPort: 10001
    name: http
  externalIPs:
  - 10.1.91.41

The sample file in the MR3 release uses 10.1.91.41:9852 as the full address of HiveServer2. The user should make sure that the IP address exists with a vaild host name and is not already taken.

hive.yaml

This file creates a Pod for running HiveServer2 (by creating a ReplicationController). The user should update several sections in this file according to Kubernetes cluster settings.

In the spec/template/spec/containers section:

  • The image field should match the Docker image specified by DOCKER_HIVE_IMG in kubernetes/env.sh.
  • The args field specifies the DAGAppMaster mode: --localthread for LocalThread mode, --localprocess for LocalProcess mode, and --kubernetes for Kubernetes mode.
  • The resources/requests and resources/limits specify the resources to to be allocated to a HiveServer2 Pod.
  • The three fields ports/containerPort, readinessProbe/tcpSocket/port, and livenessProbe/tcpSocket/port should match the port number specified in hiveserver2-service.yaml.
spec:
  template:
    spec:
      containers:
      - image: 10.1.91.17:5000/hive5
        args: ["start", "--kubernetes"]
        resources:
          requests:
            cpu: 4
            memory: 32Gi
          limits:
            cpu: 4
            memory: 32Gi
        ports:
        - containerPort: 9852
        readinessProbe: 
          tcpSocket: 
            port: 9852 
        livenessProbe: 
          tcpSocket: 
            port: 9852 

In the spec/template/spec/volumes section:

  • The configMap/name field under conf-k8s-volume should match the name specified by CONF_DIR_CONFIGMAP in kubernetes/env.sh.
  • The secret/secretName field under key-k8s-volume should match the name specified by KEYTAB_SECRET in kubernetes/env.sh.
spec:
  template:
    spec:
      volumes:
      - name: conf-k8s-volume
        configMap:
          name: hivemr3-conf-configmap
      - name: key-k8s-volume
        secret:
          secretName: hivemr3-keytab-secret

The spec/template/spec/hostAliases field can list aliases for hosts that may not be found in the default DNS. For example, the host running Metastore may be unknown to the default DNS, in which case the user can add an alias for it.

Updating kubernetes/conf/mr3-site.xml

Those configuration keys relevant to Hive on MR3 on Kubernetes are explained in the Kubernetes section of Configuring MR3. Hive on MR3 automatically sets the following configuration keys, so their values in kubernetes/conf/mr3-site.xml are ignored:

  • mr3.k8s.namespace
  • mr3.k8s.pod.master.serviceaccount
  • mr3.k8s.pod.worker.serviceaccount
  • mr3.k8s.pod.master.image
  • mr3.k8s.pod.master.user
  • mr3.k8s.master.working.dir
  • mr3.k8s.master.persistentvolumeclaim.mounts
  • mr3.k8s.pod.worker.image
  • mr3.k8s.pod.worker.user
  • mr3.k8s.worker.working.dir
  • mr3.k8s.worker.persistentvolumeclaim.mounts
  • mr3.k8s.conf.dir.configmap
  • mr3.k8s.conf.dir.mount.dir
  • mr3.k8s.keytab.secret
  • mr3.k8s.keytab.mount.dir
  • mr3.k8s.keytab.mount.file

For the following configuration keys, their default values in kubernetes/conf/mr3-site.xml must be used:

  • mr3.k8s.master.command (with a default value of /opt/mr3-run/hive/run-master.sh)
  • mr3.k8s.worker.command (with a default value of /opt/mr3-run/hive/run-worker.sh)

For the following configuration keys, their default values in kubernetes/conf/mr3-site.xml usually suffice, but the user may need to update their values according to the setting of the Kubernetes cluster:

  • mr3.k8s.api.server.url
  • mr3.k8s.service.account.token.path
  • mr3.k8s.service.account.token.ca.cert.path
  • mr3.k8s.nodes.polling.interval.ms
  • mr3.k8s.pods.polling.interval.ms
  • mr3.k8s.pod.creation.timeout.ms
  • mr3.k8s.pod.master.node.selector
  • mr3.k8s.master.pod.affinity.match.label. A value hivemr3_app=hiveserver2 means that the DAGAppMaster Pod is likely to be placed on the same node that hosts a Pod with label hivemr3_app=hiveserver2, namely the HiveServer2 Pod.
  • mr3.k8s.pod.worker.node.selector

For the following configuration keys, the user should check their values before starting Hive on MR3:

  • mr3.k8s.pod.image.pull.policy
  • mr3.k8s.pod.image.pull.secrets
  • mr3.k8s.host.aliases

The user can use the following configuration keys to specify tolerations for DAGAppMaster and ContainerWorker Pods:

  • mr3.k8s.pod.master.toleration.specs
  • mr3.k8s.pod.worker.toleration.specs

Their values are a comma-separated list of toleration specifications. The format of a toleration specification is [key]:[operator]:[value]:[effect]:[toleration in seconds] where [value] and :[toleration in seconds] are optional. Here are a few valid examples: hello:Equal:world:NoSchedule, hello:Exists::NoSchedule, hello:Equal:world:NoExecute:300, hello:Exists::NoExecute:300. Note that a wrong specification fails the creation of DAGAppMaster Pod or ContainerWorker Pods, so the user should check the validity of every toleration specification before running HiveServer2. For example, foo:Equal::NoSchedule is a wrong specification because [value] must be empty when [operator] is Exists. (Cf. foo:Equal::NoSchedule is okay.)

The following configuration keys determine emptyDir and hostPath volumes to be mounted inside DAGAppMaster and ContainerWorker Pods, and thus should be updated for each installation of Hive on MR3:

  • mr3.k8s.pod.master.emptydirs
  • mr3.k8s.pod.master.hostpaths
  • mr3.k8s.pod.worker.emptydirs
  • mr3.k8s.pod.worker.hostpaths

For both DAGAppMaster and ContainerWorker Pods, emptyDir and hostPath volumes become local directories where intermediate data is written, so at least one such volume is necessary. For the DAGAppMaster Pod, the following setting is usually okay because DAGAppMaster needs just a single local directory (unless it uses Local mode for ContainerWorkers):

  • mr3.k8s.pod.master.emptydirs = /opt/mr3-run/work-local-dir
  • mr3.k8s.pod.master.hostpaths is set to empty.

For ContainerWorker Pods, the set of available local directories matters for performance. If the same set of local disks are mounted on every node in the Kubernetes cluster, the user can set mr3.k8s.pod.master.hostpaths to the list of directories from local disks while leaving mr3.k8s.pod.worker.emptydirs to empty. For example, the following setting is appropriate for a homogeneous Kubernetes cluster in which three local disks are mounted on every node:

  • mr3.k8s.pod.worker.emptydirs is set to empty
  • mr3.k8s.pod.worker.hostpaths = /data1/k8s,/data2/k8s,/data3/k8s

Note that the user should never use more than one directory from each local disk because it only degrades the performance of writing to local disks. If no such local disks are attached, mr3.k8s.pod.worker.hostpaths should be set to empty and the user should use an emptyDir volume for writing intermediate data:

  • mr3.k8s.pod.worker.emptydirs = /opt/mr3-run/work-local-dir
  • mr3.k8s.pod.worker.hostpaths is set to empty.

Specifying ImagePullSecrets

By default, Hive on MR3 does not use ImagePullSecrets when downloading Docker images. The user can also use an existing ImagePullSecret in two steps.

First add a new field spec/template/spec/imagePullSecrets/name in hive.yaml:

spec:
  template:
    spec:
      imagePullSecrets:
      - name: myregistrykey

Alternatively the user may add a new field imagePullSecrets in hive-service-account.yaml.

Then specify the same secret in the configuration key mr3.k8s.pod.image.pull.secrets in kubernetes/conf/mr3-site.xml:

<property>
  <name>mr3.k8s.pod.image.pull.secrets</name>
  <value>myregistrykey</value>
</property>

(Alternatively the user may add a new field imagePullSecrets in master-service-account.yaml and worker-service-account.yaml.)

Similarly the user should update ats.yaml, ranger.yaml, and metastore.yaml in order to use an existing ImagePullSecret.