In a Docker Container |

This page shows how to run Hive 3 on MR3 in a single Docker container using a pre-built all-in-one Docker image mr3project/hivemr3-all:1.8 available at DockerHub.

By following the instruction, the user will learn:

how to run Hive on MR3 in a single Docker container
how to create Beeline connections and send queries to HiveServer2
how to change configurations of HiveServer2

For running Beeline outside the container, we assume that a client program to connect to HiveServer2 is already installed. This scenario should take less than 30 minutes to complete.

For asking any questions, please visit MR3 Google Group or join MR3 Slack.

Starting the all-in-one container

In order to start an all-in-one container, execute the following command in which we specify mr3project/hivemr3-all:1.8 as the name of the Docker image.

$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 9852:9852 mr3project/hivemr3-all:1.8
fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b

The initialization script installs MySQL 5.6 and downloads mysql-connector-java-8.0.17.tar.gz for a MySQL connector.

It uses the following ports for HiveServer2:

9852 for HiveServer2

The user can inspect the log file run.log inside the container to check if MetaStore and HiveServer2 have started properly.

$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# tail -f run.log 
...
starting Metastore and HiveServer2

Running Beeline inside the container as user `hive`

Inside the container, download a sample dataset and execute /opt/mr3-run/hive/run-beeline.sh as user hive.

$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# su hive
bash-4.2$ pwd
/opt/mr3-run
bash-4.2$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/pokemon.csv

bash-4.2$ hive/run-beeline.sh
...
Connecting to jdbc:hive2://fdf73c08660a:9852/;;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive

Create a table called pokemon and import the sample dataset.

0: jdbc:hive2://fdf73c08660a:9852/> CREATE TABLE pokemon (Number Int,Name String,Type1 String,Type2 String,Total Int,HP Int,Attack Int,Defense Int,Sp_Atk Int,Sp_Def Int,Speed Int) row format delimited fields terminated BY ',' lines terminated BY '\n' tblproperties("skip.header.line.count"="1");

0: jdbc:hive2://fdf73c08660a:9852/> load data local inpath '/opt/mr3-run/pokemon.csv' INTO table pokemon;

Execute queries.

0: jdbc:hive2://fdf73c08660a:9852/> select avg(HP) from pokemon;

0: jdbc:hive2://fdf73c08660a:9852/> create table pokemon1 as select *, IF(HP>160.0,'strong',IF(HP>140.0,'moderate','weak')) AS power_rate from pokemon;

0: jdbc:hive2://fdf73c08660a:9852/> select COUNT(name), power_rate from pokemon1 group by power_rate;
...
+------+-------------+
| _c0  | power_rate  |
+------+-------------+
| 363  | strong      |
| 336  | weak        |
| 108  | moderate    |
+------+-------------+
3 rows selected (1.265 seconds)

The user can find the warehouse directory /opt/mr3-run/work-dir/warehouse/.

bash-4.2$ ls work-dir/warehouse/
pokemon  pokemon1

Running Beeline outside the container as another user

The user may use any client program to connect to HiveServer2. In our example, we use the script included in the MR3 release (hive/run-beeline.sh).

Open env.sh and set environment variables HIVE3_SERVER2_HOST, HIVE3_SERVER2_PORT, SECURE_MODE, and HIVE_SERVER2_AUTHENTICATION. Note that we do not use Kerberos for connecting to HiveServer2.

$ grep -e HIVE3_SERVER2_HOST -e SECURE_MODE -e HIVE_SERVER2_AUTHENTICATION -e HIVE3_SERVER2_PORT env.sh
HIVE3_SERVER2_HOST=your.server.address
HIVE3_SERVER2_PORT=9852
SECURE_MODE=false
HIVE_SERVER2_AUTHENTICATION=NONE

Run Beeline using --tpcds and --hivesrc3 options.

$ ./hive/run-beeline.sh --tpcds --hivesrc3
...
beeline -u "jdbc:hive2://your.server.address:9852/;;" -n gla -p gla --hiveconf hive.querylog.location=/data1/gla/mr3-run-1.0/hive/run-beeline-result/hive-mr3-5ba3d48-2020-10-28-17-02-06-d0064f30/hive-logs
...
Connecting to jdbc:hive2://your.server.address:9852/;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://your.server.address:9852/>

Note that in our example, we connect to HiveServer2 as user gla, not hive.

Now execute a sample query.

0: jdbc:hive2://your.server.address:9852/> use default;
0: jdbc:hive2://your.server.address:9852/> select count(*) from pokemon;
...

Changing configurations for HiveServer2

In order to change configurations for HiveServer2, start an all-in-one container with an initial command /usr/sbin/init.

$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 6080:6080 -p 8188:8188 -p 9852:9852 mr3project/hivemr3-all:1.8 /usr/sbin/init
c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b

Then update /opt/mr3-run/env.sh and configuration files under the directory /opt/mr3-run/conf as necessary.

$ docker exec -it c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b /bin/bash
[root@c85a7579e509 mr3-run]# vi env.sh
[root@c85a7579e509 mr3-run]# vi conf/core-site.xml
[root@c85a7579e509 mr3-run]# vi conf/mr3-site.xml
[root@c85a7579e509 mr3-run]# vi conf/hive-site.xml

Finally run the initialization script run-all.sh.

[root@c85a7579e509 mr3-run]# ./run-all.sh

Starting the all-in-one container

Running Beeline inside the container as user hive

Running Beeline outside the container as another user

Changing configurations for HiveServer2

Running Beeline inside the container as user `hive`