Hive on MR3 allows the user to perform rolling updates of HiveServer2, DAGAppMaster, and ContainerWorker Pods. By performing rolling updates, the user does not have to terminate an active instance of Hive on MR3 after making updates to Hive for MR3 (e.g., applying patches from Apache Hive), or when a new version of MR3 is released.

  • After making changes to HiveServer2 components (e.g., parser and optimizer), the user may want to update HiveServer2 Pod.
  • After making changes to the execution component of Hive, the user may want to update ContainerWorker Pods.
  • When a new version of MR3 is released, the user may want to update DAGAppMaster and ContainerWorker Pods.
  • The user with access to the source code of MR3 may want to update DAGAppMaster and/or ContainerWorker Pods.

In order to perform rolling updates of DAGAppMaster and ContainerWorker Pods, the user should make sure that the configuration key mr3.k8s.pod.image.pull.policy in kubernetes/conf/mr3-site.xml is set to Always so that new Pods created after an update uses the most recent Docker image.

<property>
  <name>mr3.k8s.pod.image.pull.policy</name>
  <value>Always</value>
</property>

For HiveServer2 Pod, the user should set the field spec/template/spec/containers/imagePullPolicy to Always in kubernetes/yaml/hive.yaml.

spec:
  template:
    spec:
      containers:
        imagePullPolicy: Always

A naive approach is just to delete running Pods. This actually works well in most situations thanks to the fault-tolerance and recovery mechanism implemented in MR3. The downside is that all queries in the middle of execution may experience a brief delay, or even fail if DAGAppMaster Pod is deleted. In production environments where no queries should be interrupted, we instead recommend the user to use MasterControl to stop DAGAppMaster and ContainerWorker Pods gracefully.

Below we demonstrate how to perform rolling updates with MasterControl. See Using MasterControl for the usage of MasterControl.

Updating ContainerWorker Pods

First obtain the ApplicationID from the log file for HiveServer2 (outside Kubernetes).

$ kubectl logs -n hivemr3 hivemr3-hiveserver2-s56zk | grep ApplicationID
2020-01-14T07:01:06,552  INFO [main] client.MR3Client$: Starting DAGAppMaster with ApplicationID application_30211_0000 in session mode

Execute MasterControl with command stopContainerWorkers inside the HiveServer2 Pod to gracefully stop all ContainerWorkers. The user may execute MasterControl even in the presence of running queries (which are not affected).

$ kubectl exec -it -n hivemr3 hivemr3-hiveserver2-s56zk /bin/bash

bash-4.2$ ./master-control.sh stopContainerWorkers application_30211_0000
Sent a request to stop all ContainerWorkers for application_30211_0000

After all queries complete and ContainerWorker Pods are deleted, the user can update the Docker image for ContainerWorker Pods. Then all new ContainerWorker Pods use the updated Docker image. For the user with access to the source code of MR3, the version reported by ContainerWorker Pods should match the value for variable version in core/build.sbt.

Updating DAGAppMaster Pod

Similarly to updating ContainerWorker Pods, the user can update DAGAppMaster Pod by executing MasterControl with command closeDagAppMaster. Unlike updating ContainerWorker Pods, however, the user should update the Docker image for DAGAppMaster Pod before executing MasterControl because a new DAGAppMaster Pod automatically starts. Since no running queries are affected by MasterControl, the user may execute MasterControl at any time. After a while, a new DAGAppMaster Pod starts which uses the new Docker image.

bash-4.2$ ./master-control.sh closeDagAppMaster application_30211_0000
Sent a request to close DAGAppMaster for application_30211_0000

For the user with access to the source code of MR3, the version reported by DAGAppMaster Pod should match the value for variable version in core/build.sbt.

Updating HiveServer2 Pod

Currently MR3 uses a ReplicationController resource for HiveServer2, not a Deployment resource, for backward compatibility. Thus the user can follow the standard procedure for ReplicationController resources in order to perform rolling updates of HiveServer2 Pod.