Enabling high availability

In order to enable high availability, the user should take the following two steps:

  • set environment variable MR3_SHARED_SESSION_ID
  • configure HiveServer2 by updating hive-site.xml

First the user should set environment variable MR3_SHARED_SESSION_ID to a unique string before starting HiveServer2 instances. For example, a random UUID such as 2fc302f0-a29e-46e4-a30e-17947421690f (obtained by executing cat /proc/sys/kernel/random/uuid) is okay to use. The environment variable (whose value internally serves as the MR3 session ID in HiveServer2) can be set to any string, but should remain the same across all HiveServer2 instances (so that they share the same staging path on HDFS).

Next the user should configure HiveServer2 by updating hive-site.xml. The following table shows those configuration keys relevant to high availability of Hive on MR3. In particular, setting hive.server2.active.passive.ha.enable to true enables high availability and allows all HiveServer2 instances to share a common MR3 DAGAppMaster. Note that high availability requires service discovery by ZooKeeper as well (specified by hive.server2.support.dynamic.service.discovery) so that all HiveServer2 instances are registered to ZooKeeper.

Name Default value example to enable high availability
hive.server2.active.passive.ha.enable false true
hive.server2.support.dynamic.service.discovery false true
hive.mr3.zookeeper.appid.namespace mr3AppId (default value)
hive.server2.active.passive.ha.registry.namespace hs2ActivePassiveHA (default value)
hive.server2.zookeeper.namespace hiveserver2 hiveserver2-mr3
hive.zookeeper.quorum gold0:2181
hive.zookeeper.client.port 2181 2181

Running multiple HiveServer2 instances on the same node

If multiple HiveServer2 instances are to be run on the same node, the user should assign different ports by setting HIVE3_SERVER2_PORT in env.sh to a unique value for each HiveServer2 instance. In addition, the following two configuration keys should be set in hive-site.xml.

  • set hive.server2.webui.port to 0 so that no conflict arises.
  • set hive.server2.logging.operation.log.location appropriately, e.g., /tmp/hive/operation_logs/${hive.server2.port}.

Since high availability of Hive on MR3 requires service discovery, a Beeline connection can choose a HiveServer2 instance randomly. In order to take advantage of service discovery, the user should specify the ZooKeeper server and a namespace when running Beeline. For example, with a ZooKeeper server running at gold0:2181 and a namespace hiveserver2-mr3, the user can execute the following command

$ beeline -u "jdbc:hive2://gold0:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-mr3"