Hive can run only if Metastore is running. Hive on MR3 can run with any instance of Metastore of the same version, not necessarily one included in the MR3 release. For example, if Metastore is already running in a Hadoop cluster, the user may reuse it without starting another instance of Metastore. We, however, recommend Metastore included in the MR3 release because it introduces a few improvements (e.g., https://github.com/apache/hive/pull/454). In a multi-user environment, the administrator user (e.g., hive) typically starts Metastore.

Running Metastore

In order to run Metastore included in the MR3 release, set the following environment variables in env.sh as necessary:

HIVE_METASTORE_HEAPSIZE=12288

HIVE3_DATABASE_HOST=$HOSTNAME
HIVE3_METASTORE_HOST=$HOSTNAME
HIVE3_METASTORE_PORT=9830
HIVE3_METASTORE_LOCAL_PORT=9831
HIVE3_DATABASE_NAME=hive3mr3
HIVE3_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE4_DATABASE_HOST=$HOSTNAME
HIVE4_METASTORE_HOST=$HOSTNAME
HIVE4_METASTORE_PORT=9840
HIVE4_METASTORE_LOCAL_PORT=9841
HIVE4_DATABASE_NAME=hive4mr3
HIVE4_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse

HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/_HOST@HADOOP
HIVE_METASTORE_KERBEROS_KEYTAB=/etc/security/keytabs/hive.service.keytab

HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar

Note that env.sh specifies a Metastore address (host and port) for each version of Hive because of the incompatibility between different versions of Metastore.

  • HIVE_METASTORE_HEAPSIZE specifies the heap size (in MB) for Metastore.
  • HIVE3_DATABASE_HOST specifies the host where the database for Metastore is running, whereas HIVE3_METASTORE_HOST specifies the host where Metastore itself is running.
  • HIVE3_METASTORE_LOCAL_PORT specifies the port for Metastore running in local mode (in which everything runs on a single machine) with --hivesrc3. If the user does not need Hive on MR3 in local mode, this environment variable may be ignored.
  • HIVE3_DATABASE_NAME specifies the database name for Metastore running with --hivesrc3.
  • HIVE3_HDFS_WAREHOUSE specifies the directory for the Hive warehouse on HDFS for Metastore running in non-local mode with --hivesrc3. For local mode, Hive on MR3 creates a Hive warehouse under the installation directory. Note that different versions of Metastore can share the same Hive warehouse, while their databases cannot be shared.
  • Similarly for --hivesrc4.
  • HIVE_METASTORE_KERBEROS_PRINCIPAL and HIVE_METASTORE_KERBEROS_KEYTAB specify the principal and keytab file for Metastore, and correspond to configuration keys hive.metastore.kerberos.principal and hive.metastore.kerberos.keytab.file in hive-site.xml.

  • HIVE_MYSQL_DRIVER specifies the path to a MySQL connector jar file which is necessary when using a MySQL database. One can download the official JDBC driver for MySQL at https://dev.mysql.com/downloads/connector/j/.

In order to start Metastore, execute hive/metastore-service.sh with the following options:

start                     # Start Metastore on port defined in HIVE?_METASTORE_PORT.
stop                      # Stop Metastore on port defined in HIVE?_METASTORE_PORT.
restart                   # Restart Metastore on port defined in HIVE?_METASTORE_PORT.
--local                   # Run jobs with configurations in conf/local/ (default).
--cluster                 # Run jobs with configurations in conf/cluster/.
--tpcds                   # Run jobs with configurations in conf/tpcds/.
--hivesrc3                # Choose hive3-mr3 (based on Hive 3.1.3) (default).
--hivesrc4                # Choose hive4-mr3 (based on Hive 4.0.0-SNAPSHOT).
--init-schema             # Initialize the database schema. 
--hiveconf <key>=<value>  # Add a configuration key/value (ignored by Metastore).
<Metastore option>        # Add a Metastore option.
  • The user should use --init-schema to initialize the database schema when running Metastore for the first time. Otherwise the script may fail with the following error message in the log:
    MetaException(message:Version information not found in metastore. )
    

For a MySQL database (or any database server supported by Metastore), the user should connect to the MySQL server and execute a command to delete it.

  • The user can append as many Metastore options (for the command hive --service metastore from Hive) as necessary to the command. In the current implementation, --hiveconf <key>=<value> is ignored.

To see the type of the database used by Metastore, find the configuration key javax.jdo.option.ConnectionDriverName in hive-site.xml. For example, with --tpcds, Metastore uses a MySQL database:

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

If the configuration key javax.jdo.option.ConnectionDriverName is missing in hive-site.xml, Metastore uses a Derby database by default, as is the case when starting Metastore with either --local or --cluster. With --tpcds, it uses a MySQL database.

In order to use a MySQL database, the user (who starts Metastore) should have access to the database with a user name and a password, which should be explicitly set in hive-site.xml:

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hivemr3</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

Here are examples of starting Metastore for the first time:

$ hive/metastore-service.sh start --local --hivesrc3
$ hive/metastore-service.sh start --tpcds --hivesrc3 --init-schema

Log file of Metastore

By default, the log file for starting Metastore is written to /tmp/<user name>/hive.log. Below is an example of messages printed to the log file when Metastore starts successfully:

2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Started the new metaserver on port [9830]...
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Options.minWorkerThreads = 200
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: Options.maxWorkerThreads = 1000
2018-03-12T14:52:24,611  INFO [main] metastore.HiveMetaStore: TCP keepalive = true

Note that all instances of Metastore started by the same user share the same log file.