Enabling high availability
In order to enable high availability, the user should take the following two steps:
- set environment variable
MR3_SHARED_SESSION_ID
- configure HiveServer2 by updating
hive-site.xml
First the user should set environment variable MR3_SHARED_SESSION_ID
to a unique string before starting HiveServer2 instances.
For example, a random UUID such as 2fc302f0-a29e-46e4-a30e-17947421690f
(obtained by executing cat /proc/sys/kernel/random/uuid
) is okay to use.
The environment variable (whose value internally serves as the MR3 session ID in HiveServer2) can be set to any string,
but should remain the same across all HiveServer2 instances (so that they share the same staging path on HDFS).
Next the user should configure HiveServer2 by updating hive-site.xml
.
The following table shows those configuration keys relevant to high availability of Hive on MR3.
In particular, setting hive.server2.active.passive.ha.enable
to true enables high availability and allows all HiveServer2 instances to share a common MR3 DAGAppMaster.
Note that high availability requires service discovery by ZooKeeper as well (specified by hive.server2.support.dynamic.service.discovery
) so that all HiveServer2 instances are registered to ZooKeeper.
Name | Default value | example to enable high availability |
---|---|---|
hive.server2.active.passive.ha.enable | false | true |
hive.server2.support.dynamic.service.discovery | false | true |
hive.mr3.zookeeper.appid.namespace | mr3AppId | (default value) |
hive.server2.active.passive.ha.registry.namespace | hs2ActivePassiveHA | (default value) |
hive.server2.zookeeper.namespace | hiveserver2 | hiveserver2-mr3 |
hive.zookeeper.quorum | gold0:2181 |
|
hive.zookeeper.client.port | 2181 | 2181 |
Running multiple HiveServer2 instances on the same node
If multiple HiveServer2 instances are to be run on the same node,
the user should assign different ports by setting HIVE3_SERVER2_PORT
in env.sh
to a unique value for each HiveServer2 instance.
In addition, the following two configuration keys should be set in hive-site.xml
.
- set
hive.server2.webui.port
to 0 so that no conflict arises. - set
hive.server2.logging.operation.log.location
appropriately, e.g.,/tmp/hive/operation_logs/${hive.server2.port}
.
Since high availability of Hive on MR3 requires service discovery,
a Beeline connection can choose a HiveServer2 instance randomly.
In order to take advantage of service discovery,
the user should specify the ZooKeeper server and a namespace when running Beeline.
For example, with a ZooKeeper server running at gold0:2181
and a namespace hiveserver2-mr3
, the user can execute the following command
$ beeline -u "jdbc:hive2://gold0:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-mr3"