On Non-secure Hadoop
This page shows how to operate Hive on MR3 in a non-secure Hadoop cluster without Kerberos (after completing the preliminary steps). By following the instructions, the user will learn:
- how to configure Hive on MR3 in a non-secure Hadoop cluster
- how to start and stop Metastore
- how to start and stop HiveServer2
- how to create Beeline connections and send queries to HiveServer2
Configuring Metastore
Hive on MR3 can run with any compatible version of Metastore, not necessarily the one included in the MR3 release. For example, if a Metastore instance is already running in a Hadoop cluster, the user may reuse it without starting a new instance.
In our example, we start a new instance of Metastore. We use either Derby or MySQL for the Metastore database.
Using PostgreSQL or MS SQL for the Metastore database is analogous to using MySQL.
Option 1 - Using Derby
To use Derby for the Metastore database,
open env.sh
and set HIVE_METASTORE_DB_TYPE
to derby
.
vi env.sh
HIVE_METASTORE_DB_TYPE=derby
As we use --tpcds
option,
open conf/tpcds/hive-site.xml
and update the following configuration keys.
Remove the configuration keys javax.jdo.option.ConnectionDriverName
and javax.jdo.option.ConnectionUserName
.
vi conf/tpcds/hive-site.xml
<property>
<name>hive.metastore.db.type</name>
<value>DERBY</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=${hive.local.data}/metastore_cluster/${hive.database.name};create=true</value>
</property>
<!--
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hivemr3</value>
</property>
-->
Option 2 - Using MySQL
To use MySQL for the Metastore database,
we assume that a MySQL user root
has access to a MySQL server with all privileges.
The MySQL server may run on any node, not necessarily on the node where Metastore or HiveServer2 will be running.
mysql -u root -p
...
MariaDB [(none)]> SHOW GRANTS FOR 'root'@'%';
+--------------------------------------------------------------------------------------------------------------------------------+
| Grants for root@% |
+--------------------------------------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY PASSWORD '*2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19' WITH GRANT OPTION |
+--------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Open env.sh
and set the following environment variables.
vi env.sh
HIVE_METASTORE_DB_TYPE=mysql
HIVE_DATABASE_HOST=$HOSTNAME
HIVE_DATABASE_NAME=hivemr3
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
HIVE_METASTORE_DB_TYPE
should be set tomysql
.HIVE_DATABASE_HOST
specifies the node where the MySQL server runs.HIVE_DATABASE_NAME
specifies the name of a new database to be created for Metastore inside MySQL. In our example, we usehivemr3
.HIVE_MYSQL_DRIVER
should point to the MySQL connector jar file.
As we use --tpcds
option,
open conf/tpcds/hive-site.xml
and update the following configuration keys.
Set javax.jdo.option.ConnectionUserName
to the MySQL user name
and javax.jdo.option.ConnectionPassword
to the password.
vi conf/tpcds/hive-site.xml
<property>
<name>hive.metastore.db.type</name>
<value>MYSQL</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://${hive.database.host}/${hive.database.name}?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
For other types of Metastore databases,
HIVE_METASTORE_DB_TYPE
in env.sh
should be set to postgresql
or mssql
.
The configuration keys javax.jdo.option.ConnectionURL
and javax.jdo.option.ConnectionDriverName
in conf/tpcds/hive-site.xml
should be updated as well.
Running Metastore
Run Metastore using --tpcds
option and initialize the database schema using --init-schema
option.
hive/metastore-service.sh start --tpcds --init-schema
# Running Metastore using Hive-MR3 #
Output Directory:
/home/hive/mr3/hadoop/hive/metastore-service-result/hive-2025-03-14-00-20-47-89f10360
Starting Metastore...
After a while, check if Metastore has successfully started.
tail -f /home/hive/mr3/hadoop/hive/metastore-service-result/hive-2025-03-14-00-20-47-89f10360/out-metastore.txt
...
Starting metastore schema initialization to 4.0.0
Initialization script hive-schema-4.0.0.derby.sql
...
Initialization script completed
...
2025-03-14 00:20:58: Starting Hive Metastore Server
The user should use --init-schema
to initialize the database schema
when running Metastore for the first time.
Without --init-schema
, the script fails with the following error message.
MetaException(message:Version information not found in metastore.)
When restarting Metastore,
do not use --init-schema
option in order to reuse existing Hive databases.
For example, the user can kill Metastore and restart it as follows.
hive/metastore-service.sh stop --tpcds
hive/metastore-service.sh start --tpcds
The log file of Metastore is found under the output directory of Metastore.
ls /home/hive/mr3/hadoop/hive/metastore-service-result/hive-2025-03-14-00-20-47-89f10360/hive-logs/
hive.log
Running HiveServer2
Run HiveServer2 using --tpcds
option.
In order to use LocalProcess mode for MR3 DAGAppMaster, use --amprocess
option.
hive/hiveserver2-service.sh start --tpcds --amprocess
# Running HiveServer2 using Hive-MR3 #
Output Directory:
/home/hive/mr3/hadoop/hive/hiveserver2-service-result/hive-2025-03-14-00-35-08-b34a5b8f
Starting HiveServer2...
In LocalProcess mode, MR3 DAGAppMaster runs as a regular process on the same machine, rather than a thread inside HiveServer2 or a Yarn container.
After a while, check if HiveServer2 has successfully started by inspecting its log file.
grep -e "New MR3Session created" /home/hive/mr3/hadoop/hive/hiveserver2-service-result/hive-2025-03-14-00-35-08-b34a5b8f/hive-logs/hive.log
2025-03-13T15:35:34,451 INFO [main] session.MR3SessionManagerImpl: New MR3Session created: cca348a5-1f4e-4f59-a9f2-c7defbe0630f, hive
The user can find a new Yarn application of type mr3
submitted by the user.
yarn application -list
...
application_1739072579773_0335 hive-mr3-cca348a5-1f4e-4f59-a9f2-c7defbe0630f mr3 hive default RUNNING UNDEFINED 0% N/A
As we use LocalProcess mode for MR3 DAGAppMaster, its log file is found under the output directory of HiveServer2.
ls /home/hive/mr3/hadoop/hive/hiveserver2-service-result/hive-2025-03-14-00-35-08-b34a5b8f/application_1739072579773_0335/run.log
/home/hive/mr3/hadoop/hive/hiveserver2-service-result/hive-2025-03-14-00-35-08-b34a5b8f/application_1739072579773_0335/run.log
Running Beeline
The user may use any client program (not necessarily Beeline included in the MR3 release) to connect to HiveServer2. In our example, we run Beeline included in the MR3 release.
Download a sample dataset.
wget https://github.com/mr3project/mr3/releases/download/v2.0/pokemon.csv
Run Beeline.
hive/run-beeline.sh
Use the default database.
0: jdbc:hive2://blue0:9842/> use default;
Create a table called pokemon
.
0: jdbc:hive2://blue0:9842/> CREATE TABLE pokemon (Number Int,Name String,Type1 String,Type2 String,Total Int,HP Int,Attack Int,Defense Int,Sp_Atk Int,Sp_Def Int,Speed Int) row format delimited fields terminated BY ',' lines terminated BY '\n' tblproperties("skip.header.line.count"="1");
Import the sample dataset.
0: jdbc:hive2://blue0:9842/> load data local inpath './pokemon.csv' INTO table pokemon;
Execute queries.
0: jdbc:hive2://blue0:9842/> select avg(HP) from pokemon;
0: jdbc:hive2://blue0:9842/> create table pokemon1 as select *, IF(HP>160.0,'strong',IF(HP>140.0,'moderate','weak')) AS power_rate from pokemon;
0: jdbc:hive2://blue0:9842/> select COUNT(name), power_rate from pokemon1 group by power_rate;
Exit Beeline. The warehouse directory on HDFS has now two sub-directories corresponding to the two Hive tables created above.
hdfs dfs -ls /user/hive/warehouse
Found 2 items
drwxr-xr-x - hive hdfs 0 2025-03-13 15:42 /user/hive/warehouse/pokemon
drwxr-xr-x - hive hdfs 0 2025-03-13 15:43 /user/hive/warehouse/pokemon1
Stopping HiveServer2 and Metastore
Stop HiveServer2. MR3 DAGAppMaster also stops.
hive/hiveserver2-service.sh stop --tpcds
Stop Metastore.
hive/metastore-service.sh stop --tpcds