Hive can run only if Metastore is running.
Hive on MR3 can run with any instance of Metastore of the same version, not necessarily one included in the MR3 release.
For example, if Metastore is already running in a Hadoop cluster, the user may reuse it without starting another instance of Metastore.
We, however, recommend Metastore included in the MR3 release because it introduces a few improvements (e.g., https://github.com/apache/hive/pull/454).
In a multi-user environment, the administrator user (e.g., hive
) typically starts Metastore.
Running Metastore
In order to run Metastore included in the MR3 release, set the following environment variables in env.sh
as necessary:
HIVE_METASTORE_HEAPSIZE=12288
HIVE3_DATABASE_HOST=$HOSTNAME
HIVE3_METASTORE_HOST=$HOSTNAME
HIVE3_METASTORE_PORT=9830
HIVE3_METASTORE_LOCAL_PORT=9831
HIVE3_DATABASE_NAME=hive3mr3
HIVE3_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse
HIVE4_DATABASE_HOST=$HOSTNAME
HIVE4_METASTORE_HOST=$HOSTNAME
HIVE4_METASTORE_PORT=9840
HIVE4_METASTORE_LOCAL_PORT=9841
HIVE4_DATABASE_NAME=hive4mr3
HIVE4_HDFS_WAREHOUSE=/tmp/hivemr3/warehouse
HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/_HOST@HADOOP
HIVE_METASTORE_KERBEROS_KEYTAB=/etc/security/keytabs/hive.service.keytab
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
Note that env.sh
specifies a Metastore address (host and port) for each version of Hive because of the incompatibility between different versions of Metastore.
HIVE_METASTORE_HEAPSIZE
specifies the heap size (in MB) for Metastore.
HIVE3_DATABASE_HOST
specifies the host where the database for Metastore is running, whereasHIVE3_METASTORE_HOST
specifies the host where Metastore itself is running.HIVE3_METASTORE_LOCAL_PORT
specifies the port for Metastore running in local mode (in which everything runs on a single machine) with--hivesrc3
. If the user does not need Hive on MR3 in local mode, this environment variable may be ignored.HIVE3_DATABASE_NAME
specifies the database name for Metastore running with--hivesrc3
.HIVE3_HDFS_WAREHOUSE
specifies the directory for the Hive warehouse on HDFS for Metastore running in non-local mode with--hivesrc3
. For local mode, Hive on MR3 creates a Hive warehouse under the installation directory. Note that different versions of Metastore can share the same Hive warehouse, while their databases cannot be shared.- Similarly for
--hivesrc4
.
-
HIVE_METASTORE_KERBEROS_PRINCIPAL
andHIVE_METASTORE_KERBEROS_KEYTAB
specify the principal and keytab file for Metastore, and correspond to configuration keyshive.metastore.kerberos.principal
andhive.metastore.kerberos.keytab.file
inhive-site.xml
. -
HIVE_MYSQL_DRIVER
specifies the path to a MySQL connector jar file which is necessary when using a MySQL database. One can download the official JDBC driver for MySQL at https://dev.mysql.com/downloads/connector/j/.
In order to start Metastore, execute hive/metastore-service.sh
with the following options:
start # Start Metastore on port defined in HIVE?_METASTORE_PORT.
stop # Stop Metastore on port defined in HIVE?_METASTORE_PORT.
restart # Restart Metastore on port defined in HIVE?_METASTORE_PORT.
--local # Run jobs with configurations in conf/local/ (default).
--cluster # Run jobs with configurations in conf/cluster/.
--tpcds # Run jobs with configurations in conf/tpcds/.
--hivesrc3 # Choose hive3-mr3 (based on Hive 3.1.3)
--hivesrc4 # Choose hive4-mr3 (based on Hive 4.0.0, default)
--init-schema # Initialize the database schema.
--hiveconf <key>=<value> # Add a configuration key/value (ignored by Metastore).
<Metastore option> # Add a Metastore option.
- The user should use
--init-schema
to initialize the database schema when running Metastore for the first time. Otherwise the script may fail with the following error message in the log:MetaException(message:Version information not found in metastore. )
For a MySQL database (or any database server supported by Metastore), the user should connect to the MySQL server and execute a command to delete it.
- The user can append as many Metastore options (for the command
hive --service metastore
from Hive) as necessary to the command. In the current implementation,--hiveconf <key>=<value>
is ignored.
To see the type of the database used by Metastore, find the configuration key javax.jdo.option.ConnectionDriverName
in hive-site.xml
.
For example, with --tpcds
, Metastore uses a MySQL database:
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
If the configuration key javax.jdo.option.ConnectionDriverName
is missing in hive-site.xml
, Metastore uses a Derby database by default,
as is the case when starting Metastore with either --local
or --cluster
.
With --tpcds
, it uses a MySQL database.
In order to use a MySQL database, the user (who starts Metastore) should have access to the database with a user name and a password,
which should be explicitly set in hive-site.xml
:
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hivemr3</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
Here are examples of starting Metastore for the first time:
$ hive/metastore-service.sh start --local --hivesrc3
$ hive/metastore-service.sh start --tpcds --hivesrc3 --init-schema
Log file of Metastore
By default, the log file for starting Metastore is written to /tmp/<user name>/hive.log
.
Below is an example of messages printed to the log file when Metastore starts successfully:
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: Started the new metaserver on port [9830]...
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: Options.minWorkerThreads = 200
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: Options.maxWorkerThreads = 1000
2018-03-12T14:52:24,611 INFO [main] metastore.HiveMetaStore: TCP keepalive = true
Note that all instances of Metastore started by the same user share the same log file.