For running HiveServer2, the user should use HiveServer2 included in the MR3 release. Any client program (not necessarily those included in the MR3 release), however, can be used to connect to HiveServer2. In a multi-user environment, the administrator user (e.g., hive) typically starts HiveServer2.

Running HiveServer2

In order to run HiveServer2, set the following environment variables in env.sh as necessary:

HIVE_SERVER2_HEAPSIZE=16384

HIVE3_SERVER2_HOST=$HOSTNAME
HIVE3_SERVER2_PORT=9832

HIVE4_SERVER2_HOST=$HOSTNAME
HIVE4_SERVER2_PORT=9842

HIVE_SERVER2_AUTHENTICATION=NONE
HIVE_SERVER2_KERBEROS_PRINCIPAL=hive/_HOST@HADOOP
HIVE_SERVER2_KERBEROS_KEYTAB=/home/hiveve/hive.keytab

Note that env.sh specifies a HiveServer2 address (host and port) for each version of Hive because of the incompatibility between different versions of HiveServer2.

  • HIVE_SERVER2_HEAPSIZE specifies the heap size (in MB) for HiveServer2.
  • HIVE_SERVER2_AUTHENTICATION specifies the authentication option for HiveServer2: NONE, NOSASL, KERBEROS, LDAP, PAM, and CUSTOM. It corresponds to configuration key hive.server2.authentication in hive-site.xml.
  • HIVE_SERVER2_KERBEROS_PRINCIPAL and HIVE_SERVER2_KERBEROS_KEYTAB specify the principal and keytab file for HiveServer2, and correspond to configuration keys hive.server2.authentication.kerberos.principal and hive.server2.authentication.kerberos.keytab in hive-site.xml.

In order to start HiveServer2, execute hive/hiveserver2-service.sh with the following options:

start                     # Start HiveServer2 on port defined in HIVE?_SERVER2_PORT.
stop                      # Stop HiveServer2 on port defined in HIVE?_SERVER2_PORT.
restart                   # Restart HiveServer2 on port defined in HIVE?_SERVER2_PORT.
--local                   # Run jobs with configurations in conf/local/ (default).
--cluster                 # Run jobs with configurations in conf/cluster/.
--tpcds                   # Run jobs with configurations in conf/tpcds/.
--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.3) (default).
--hivesrc4                # Choose hive4-mr3 (based on Hive 4.0.0-SNAPSHOT).
--amprocess               # Run the MR3 DAGAppMaster in LocalProcess mode.
--hiveconf <key>=<value>  # Add a configuration key/value.
<HiveServer2 option>      # Add a HiveServer2 option.
  • With --amprocess, HiveServer2 runs every MR3 DAGAppMaster in LocalProcess mode. Hence, each time HiveServer2 starts a DAGAppMaster, it creates a new process on the same machine. In the case of HiveServer2 running in shared session mode, it creates such a new process immediately. (Currently --amprocess cannot be used in a secure cluster with Kerberos; see the documentation on LocalProcess mode.)
  • The user can append as many HiveServer2 options (for the command hive --service hiveserver2 from Hive) as necessary to the command.

When executing hive/hiveserver2-service.sh, it is best to reuse the same option used for hive/metastore-service.sh. For example, in order to connect to Metastore started with --tpcds --hivesrc3, it is best to execute hive/hiveserver2-service.sh with the same option. Otherwise mismatches in the Hive version and configuration values may lead to erroneous cases that are hard to diagnose.

Here are a few examples of running the script:

# start HiveServer2 that starts a new DAGAppMaster for each Beeline connection
$ hive/hiveserver2-service.sh start --local --hivesrc3 --hiveconf hive.server2.mr3.share.session=false

# start HiveServer2 that starts a common DAGAppMaster for all Beeline connections
$ hive/hiveserver2-service.sh start --local --hivesrc3 --hiveconf hive.server2.mr3.share.session=true

# start HiveServer2 that starts a common DAGAppMaster in LocalProcess mode 
$ hive/hiveserver2-service.sh start --tpcds --hivesrc3 --amprocess --hiveconf hive.server2.mr3.share.session=true

Output directory of HiveServer2

Executing hive/hiveserver2-service.sh creates a new directory under hive/hiveserver2-service-result:

hive-mr3--2018-03-12--17-07-13-babdc6b3/
├── command
├── conf
   ├── beeline-log4j2.properties
...
   └── yarn-site.xml
├── env
└── hive-logs
    ├── hive.log
    └── out-hiveserver2.txt

The name of the HiveServer2 output directory ends with a random sequence such as babdc6b3.

  • command contains the command executed to start HiveServer2.
  • conf is a directory containing all configuration files that are effective at the time of starting HiveServer2.
  • env lists all environment variables that are effective at the time of starting HiveServer2.
  • hive-logs/hive.log is the log file for HiveServer2.
  • hive-logs/out-hiveserver2.txt is the output of hive/hiveserver2-service.sh.

For HiveServer2 started with --amprocess, every MR3 DAGAppMaster (which runs in a process on the same machine) creates a new directory with the same name as the application ID under the HiveServer2 output directory. Typically the DAGAppMaster output directory contains the log file for the DAGAppMaster, stderr output, and stdout output, as shown in the following example:

application_1516622736564_1439/
├── run.log
├── stderr
└── stdout

If HiveServer2 establishes connections to ZooKeeper at its launch, (e.g., if hive.security.metastore.authorization.manager is set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider in hive-site.xml), the user should make sure that the Curator module (org.apache.curator) used in HiveServer2 should be compatible with the ZooKeeper service. For example, Hive 4 on MR3 uses the Curator module 4.2.0 (or higher), so ZooKeeper 3.5 (or higher) should be running. If not, the user should manually run ZooKeeper of an appropriate version and set the configuration key hive.zookeeper.quorum in hive-site.xml.