Enabling High Availability |

Enabling high availability

In order to enable high availability, the user should take the following two steps:

set environment variable MR3_SHARED_SESSION_ID
configure HiveServer2 by updating hive-site.xml

First the user should set environment variable MR3_SHARED_SESSION_ID to a unique string before starting HiveServer2 instances. For example, a random UUID such as 2fc302f0-a29e-46e4-a30e-17947421690f (obtained by executing cat /proc/sys/kernel/random/uuid) is okay to use. The environment variable (whose value internally serves as the MR3 session ID in HiveServer2) can be set to any string, but should remain the same across all HiveServer2 instances (so that they share the same staging path on HDFS).

Next the user should configure HiveServer2 by updating hive-site.xml. The following table shows those configuration keys relevant to high availability of Hive on MR3. In particular, setting hive.server2.active.passive.ha.enable to true enables high availability and allows all HiveServer2 instances to share a common MR3 DAGAppMaster. Note that high availability requires service discovery by ZooKeeper as well (specified by hive.server2.support.dynamic.service.discovery) so that all HiveServer2 instances are registered to ZooKeeper.

Name	Default value	example to enable high availability
hive.server2.active.passive.ha.enable	false	true
hive.server2.support.dynamic.service.discovery	false	true
hive.mr3.zookeeper.appid.namespace	mr3AppId	(default value)
hive.server2.active.passive.ha.registry.namespace	hs2ActivePassiveHA	(default value)
hive.server2.zookeeper.namespace	hiveserver2	`hiveserver2-mr3`
hive.zookeeper.quorum		`gold0:2181`
hive.zookeeper.client.port	2181	2181

Running multiple HiveServer2 instances on the same node

If multiple HiveServer2 instances are to be run on the same node, the user should assign different ports by setting HIVE3_SERVER2_PORT in env.sh to a unique value for each HiveServer2 instance. In addition, the following two configuration keys should be set in hive-site.xml.

set hive.server2.webui.port to 0 so that no conflict arises.
set hive.server2.logging.operation.log.location appropriately, e.g., /tmp/hive/operation_logs/${hive.server2.port}.

Since high availability of Hive on MR3 requires service discovery, a Beeline connection can choose a HiveServer2 instance randomly. In order to take advantage of service discovery, the user should specify the ZooKeeper server and a namespace when running Beeline. For example, with a ZooKeeper server running at gold0:2181 and a namespace hiveserver2-mr3, the user can execute the following command

$ beeline -u "jdbc:hive2://gold0:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-mr3"