This page shows how to run Hive 4 on MR3
in a single Docker container
using a pre-built all-in-one Docker image mr3project/hivemr3-all:4.0.0
available at DockerHub.
By following the instruction, the user will learn:
- how to run Hive 4 on MR3 in a single Docker container
- how to create Beeline connections and send queries to HiveServer2
- how to change configurations of HiveServer2
This scenario should take less than 30 minutes to complete.
For asking any questions, please visit MR3 Google Group or join MR3 Slack.
Starting the all-in-one container
In order to start an all-in-one container, execute the following command in which we specify mr3project/hivemr3-all:4.0.0
as the name of the Docker image.
$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 9852:9852 mr3project/hivemr3-all:4.0.0
fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b
The initialization script installs MySQL server and downloads a MySQL connector, so it may take a while to finish initialization.
It uses the following ports for HiveServer2:
- 9852 for HiveServer2
The user can inspect the log file run.log
inside the container to check if MetaStore and HiveServer2 have started properly.
$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# tail -f run.log
...
starting Metastore and HiveServer2
Running Beeline inside the container as user hive
Inside the container, download a sample dataset and execute /opt/mr3-run/hive/run-beeline.sh
as user hive
.
$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# su hive
bash-4.2$ pwd
/opt/mr3-run
bash-4.2$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/pokemon.csv
bash-4.2$ hive/run-beeline.sh
...
Connecting to jdbc:hive2://fdf73c08660a:9852/;;;
Connected to: Apache Hive (version 4.0.0)
Driver: Hive JDBC (version 4.0.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.0 by Apache Hive
Create a table called pokemon
and import the sample dataset.
0: jdbc:hive2://fdf73c08660a:9852/> CREATE TABLE pokemon (Number Int,Name String,Type1 String,Type2 String,Total Int,HP Int,Attack Int,Defense Int,Sp_Atk Int,Sp_Def Int,Speed Int) row format delimited fields terminated BY ',' lines terminated BY '\n' tblproperties("skip.header.line.count"="1");
0: jdbc:hive2://fdf73c08660a:9852/> load data local inpath '/opt/mr3-run/pokemon.csv' INTO table pokemon;
Execute queries.
0: jdbc:hive2://fdf73c08660a:9852/> select avg(HP) from pokemon;
0: jdbc:hive2://fdf73c08660a:9852/> create table pokemon1 as select *, IF(HP>160.0,'strong',IF(HP>140.0,'moderate','weak')) AS power_rate from pokemon;
0: jdbc:hive2://fdf73c08660a:9852/> select COUNT(name), power_rate from pokemon1 group by power_rate;
...
+------+-------------+
| _c0 | power_rate |
+------+-------------+
| 363 | strong |
| 336 | weak |
| 108 | moderate |
+------+-------------+
3 rows selected (1.265 seconds)
The user can find the warehouse directory /opt/mr3-run/work-dir/warehouse/
.
bash-4.2$ ls work-dir/warehouse/
pokemon pokemon1
Changing configurations for HiveServer2
In order to change configurations for HiveServer2,
start an all-in-one container with an initial command /usr/sbin/init
.
$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 9852:9852 mr3project/hivemr3-all:4.0.0 /usr/sbin/init
c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b
Then update /opt/mr3-run/env.sh
and configuration files under the directory /opt/mr3-run/conf
as necessary.
$ docker exec -it c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b /bin/bash
[root@c85a7579e509 mr3-run]# vi env.sh
[root@c85a7579e509 mr3-run]# vi conf/core-site.xml
[root@c85a7579e509 mr3-run]# vi conf/mr3-site.xml
[root@c85a7579e509 mr3-run]# vi conf/hive-site.xml
Finally run the initialization script run-all.sh
.
[root@c85a7579e509 mr3-run]# ./run-all.sh