This page shows how to operate Hive on MR3 in a non-secure Hortonworks HDP cluster without Kerberos. An ordinary user (not an administrator of the Hadoop cluster) will run HiveServer2. For Metastore, we will use the instance already running on HDP. By following the instruction, the user will learn:
- how to install and configure Hive on MR3 in a non-secure HDP cluster without Kerberos
- how to start and stop HiveServer2
- how to create Beeline connections
This scenario has the following prerequisites:
- A non-secure HDP cluster is available.
- The user has administrator access to Ambari of HDP.
- The user has administrator access to HDFS.
- The user has access to the home directory and
/tmp
directory on HDFS.
This scenario should take less than 30 minutes to complete, not including the time for downloading an MR3 release. The user can apply the instruction to operate Hive on MR3 in any Hadoop cluster where Metastore is already running.
This page has been tested with MR3 release 1.7 on HDP 3.1.4.
For asking any questions, please visit MR3 Google Group or join MR3 Slack.
Installation
In our example, we use an ordinary user gla
for installing and running HiveServer2.
Download a pre-built MR3 release compatible with Metastore of HDP on a node where the command yarn
is available.
In the case of HDP 3.1.4, the version of Hive in it is 3.1,
so we should download hivemr3-1.7-hive3.1.3-k8s.tar.gz
(based on Hive 3.1.3 and corresponding to --hivesrc3
option to be used later).
Uncompress the pre-built MR3 release, rename the new directory to mr3-run
, and change the working directory.
Renaming the new directory is not strictly necessary,
but it is recommended because the sample configuration file hive-site.xml
included in the MR3 release uses the same directory name.
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.7/hivemr3-1.7-hive3.1.3-k8s.tar.gz
$ gunzip -c hivemr3-1.7-hive3.1.3-k8s.tar.gz | tar xvf -;
$ mv hivemr3-1.7-hive3.1.3-k8s/ mr3-run
$ cd mr3-run/
Configuring env.sh
Open env.sh
and set JAVA_HOME
.
Set HADOOP_HOME
to the Hadoop installation directory.
$ vi env.sh
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop
Set the following environment variables to adjust the memory size (in MB) to be allocated to each component:
HIVE_SERVER2_HEAPSIZE
specifies the memory size for HiveServer2.HIVE_CLIENT_HEAPSIZE
specifies the memory size of Beeline (beeline
command).MR3_AM_HEAPSIZE
specifies the memory size of MR3 DAGAppMaster.
$ vi env.sh
HIVE_SERVER2_HEAPSIZE=16384
HIVE_CLIENT_HEAPSIZE=1024
MR3_AM_HEAPSIZE=10240
Open the Ambari webpage as the administrator user and find the information on Metastore:
Database Name, Database Username, Database URL, Hive Database Type, JDBC Driver Class, Database Password,
hive.metastore.uris
, and Hive Metastore Warehouse directory.
Set the following environment variables according to the information on Metastore.
Note the prefix HIVE3_
which corresponds to --hivesrc3
option.
$ vi env.sh
HIVE3_DATABASE_HOST=blue0 # from Database URL
HIVE3_METASTORE_HOST=blue0 # from hive.metastore.uris
HIVE3_METASTORE_PORT=9083 # from hive.metastore.uris
HIVE3_METASTORE_LOCAL_PORT=9083
HIVE3_DATABASE_NAME=hivellap # from Database Name
HIVE3_HDFS_WAREHOUSE=/warehouse/tablespace/managed/hive # from Hive Metastore Warehouse directory
Set HIVE_MYSQL_DRIVER
to specify the path to a MySQL connector jar file.
$ vi env.sh
HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar
Set the following environment variables to indicate a non-secure HDP cluster.
$ vi env.sh
SECURE_MODE=false
TOKEN_RENEWAL_HDFS_ENABLED=false
TOKEN_RENEWAL_HIVE_ENABLED=false
HIVE_SERVER2_AUTHENTICATION=NONE
Configuring hive-site.xml
Open conf/tpcds/hive3/hive-site.xml
and set the following configuration keys
according to the information on Metastore.
$ vi conf/tpcds/hive3/hive-site.xml
<property>
<name>hive.metastore.db.type</name>
<value>MYSQL</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
Set the following configuration keys
according to the current user name gla
(instead of the default user hive
) and the working directory (instead of the default directory /home/hive
).
$ vi conf/tpcds/hive3/hive-site.xml
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/gla</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/gla/operation_logs</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/home/gla/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-llap-common-3.1.3.jar,/home/gla/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-llap-server-3.1.3.jar,/home/gla/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-llap-tez-3.1.3.jar</value>
</property>
The following configuration keys specify resources to be allocated to a Map Task, a Reduce Task, and a ContainerWorker. By default, we allocate 4GB and a single core to a Map Task and a Reduce Task. A single ContainerWorker can accommodate 10 concurrent Tasks.
$ vi conf/tpcds/hive3/hive-site.xml
<property>
<name>hive.mr3.map.task.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>hive.mr3.map.task.vcores</name>
<value>1</value>
</property>
<property>
<name>hive.mr3.reduce.task.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>hive.mr3.reduce.task.vcores</name>
<value>1</value>
</property>
<property>
<name>hive.mr3.all-in-one.containergroup.memory.mb</name>
<value>40960</value>
</property>
<property>
<name>hive.mr3.all-in-one.containergroup.vcores</name>
<value>10</value>
</property>
By default, we enable impersonation by setting hive.server2.enable.doAs
to true.
$ vi conf/tpcds/hive3/hive-site.xml
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
Creating directories on HDFS
The administrator user (e.g., hdfs
) should create a directory for user gla
.
# execute as the administrator user
$ hdfs dfs -mkdir /user/gla/
$ hdfs dfs -chown gla /user/gla/
The administrator user should also check that /tmp/gla
(corresponding to the configuration key hive.exec.scratchdir
in hive-site.xml
) does not exist on HDFS.
If the directory already exists (e.g., when running Hive on MR3 for the second time), make sure that its permission is set to 733.
HiveServer2 automatically creates a new directory with permission 733 if it does not exist.
# execute as the administrator user
$ hdfs dfs -ls /tmp/gla
ls: `/tmp/gla': No such file or directory
Then the user gla
should create a directory for storing MR3 and Tez jar files.
$ printenv | grep USER
USER=gla
$ hdfs dfs -mkdir -p /user/gla/lib/
Load MR3 jar files.
$ mr3/upload-hdfslib-mr3.sh
Load Tez jar files.
$ tez/upload-hdfslib-tez.sh
Creating temporary directories
Optionally the user can create a new directory specified by hive.server2.logging.operation.log.location
before starting HiveServer2.
$ ls -alt /tmp/gla/operation_logs
ls: cannot access /tmp/gla/operation_logs: No such file or directory
$ mkdir -p /tmp/gla/operation_logs
Updating core-site.xml
of HDP
Open the Ambari webpage as the administrator user and set two new configuration keys
hadoop.proxyuser.gla.groups
and hadoop.proxyuser.gla.hosts
in core-site.xml
.
Then restart HDFS, Yarn, and Metastore.
Without this step, HiveServer2 may fail to start with the following error (which is related to HIVE-19740).
2023-06-18T15:21:00,154 WARN [main] metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 24) after 5s. getCurrentNotificationEventId
org.apache.thrift.TApplicationException: Internal error processing get_current_notificationEventId
Running HiveServer2
Run HiveServer2 using --tpcds
and --hivesrc3
options.
In order to use LocalProcess mode for MR3 DAGAppMaster, use --amprocess
option.
$ hive/hiveserver2-service.sh start --tpcds --hivesrc3
After a while, check if HiveServer2 has successfully started by inspecting its log file.
$ tail -f /data2/gla/mr3-run/hive/hiveserver2-service-result/hive-mr3--2023-06-19-00-28-16-8db63259/hive-logs/hive.log
...
2023-06-18T15:28:39,389 INFO [main] server.Server: Started @13972ms
2023-06-18T15:28:39,390 INFO [main] server.HiveServer2: Web UI has started on port 10502
2023-06-18T15:28:39,390 INFO [main] server.HiveServer2: HS2 interactive HA not enabled. Starting sessions..
2023-06-18T15:28:39,390 INFO [main] http.HttpServer: Started HttpServer[hiveserver2] on port 10502
The user can find a new Yarn application of type mr3
submitted by the user gla
.
$ yarn application -list
...
application_1660836356025_0002 a91f2aae-6efe-43dc-87d2-1e4ac1b50b78 mr3 gla default RUNNING UNDEFINED 0% N/A
Running Beeline
Run Beeline using --tpcds
and `–hivesrc3 options.
$ hive/run-beeline.sh --tpcds --hivesrc3
# Running Beeline using Hive-MR3 (3.1.3) #
...
Connecting to jdbc:hive2://blue0:9832/;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://blue0:9832/>
The user can find all databases registered to Metastore of HDP.
0: jdbc:hive2://blue0:9832/> show databases;
...
+----------------------------------+
| database_name |
+----------------------------------+
| default |
| information_schema |
| sys |
| tpcds_bin_partitioned_orc_10000 |
| tpcds_text_1000 |
| tpcds_text_10000 |
+----------------------------------+
6 rows selected (1.652 seconds)
Another user can also run Beeline to connect to the same HiveServer2.
In the following example, another ordinary user gitlab-runner
runs Beeline
(where we assume that the directory hive/run-beeline-result/
is writable to user gitlab-runner
).
$ printenv | grep USER
USER=gitlab-runner
$ pwd
/home/gla/mr3-run
$ hive/run-beeline.sh --tpcds --hivesrc3
# Running Beeline using Hive-MR3 (3.1.3) #
...
Connecting to jdbc:hive2://blue0:9832/;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://blue0:9832/>
Stopping HiveServer2
Stop HiveServer2 as user gla
.
$ hive/hiveserver2-service.sh stop --tpcds --hivesrc3