This page shows how to run Hive 3 on MR3, Apache Ranger 2.1.0, and Timeline Server
in a single Docker container
using a pre-built all-in-one Docker image mr3project/hivemr3-all:1.2
available at DockerHub.
By following the instruction, the user will learn:
- how to run Hive on MR3, Apache Ranger 2.1.0, and Timline Server in a single Docker container
- how to create Beeline connections and send queries to HiveServer2
- how to add a new user in a Ranger policy
- how to change configurations for HiveServer2
For running Beeline outside the container, we assume that a client program to connect to HiveServer2 is already installed. This scenario should take less than 30 minutes to complete.
This page has been tested with MR3 release 1.2.
Starting the all-in-one container
In order to start an all-in-one container, execute the following command in which we specify mr3project/hivemr3-all:1.2
as the name of the Docker image.
$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 6080:6080 -p 8188:8188 -p 9852:9852 mr3project/hivemr3-all:1.2
fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b
The initialization script installs MySQL 5.6 and downloads mysql-connector-java-8.0.17.tar.gz
for a MySQL connector.
It creates the following accounts for MySQL, Ranger, and Solr:
root:passwd
for MySQLadmin:rangeradmin1
for Rangersolr:solrRock
for Solr (used internally by Ranger)
It uses the following ports for HiveServer2, Ranger, Solr, and Timeline Server:
- 9852 for HiveServer2
- 6080 for Ranger (HTTP)
- 6083 for Solr (HTTP, used internally by Ranger)
- 8188 for Timeline Server (HTTP)
The user can inspect the log file run.log
inside the container to check if all services have started properly.
$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# tail -f run.log
...
starting Ranger and Solr
...
Server: Apache Ranger
...
starting ATS
1dc82548-40ac-4973-8092-9f4c96b56cac
starting Metastore and HiveServer2
Running Beeline inside the container as user hive
Inside the container, download a sample dataset and execute /opt/mr3-run/hive/run-beeline.sh
as user hive
.
$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# su hive
bash-4.2$ pwd
/opt/mr3-run
bash-4.2$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/pokemon.csv
bash-4.2$ hive/run-beeline.sh
...
Connecting to jdbc:hive2://fdf73c08660a:9852/;;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
Create a table called pokemon
and import the sample dataset.
0: jdbc:hive2://fdf73c08660a:9852/> CREATE TABLE pokemon (Number Int,Name String,Type1 String,Type2 String,Total Int,HP Int,Attack Int,Defense Int,Sp_Atk Int,Sp_Def Int,Speed Int) row format delimited fields terminated BY ',' lines terminated BY '\n' tblproperties("skip.header.line.count"="1");
0: jdbc:hive2://fdf73c08660a:9852/> load data local inpath '/opt/mr3-run/pokemon.csv' INTO table pokemon;
Execute queries.
0: jdbc:hive2://fdf73c08660a:9852/> select avg(HP) from pokemon;
0: jdbc:hive2://fdf73c08660a:9852/> create table pokemon1 as select *, IF(HP>160.0,'strong',IF(HP>140.0,'moderate','weak')) AS power_rate from pokemon;
0: jdbc:hive2://fdf73c08660a:9852/> select COUNT(name), power_rate from pokemon1 group by power_rate;
...
+------+-------------+
| _c0 | power_rate |
+------+-------------+
| 363 | strong |
| 336 | weak |
| 108 | moderate |
+------+-------------+
3 rows selected (1.265 seconds)
The user can find the warehouse directory /opt/mr3-run/work-dir/warehouse/
.
bash-4.2$ ls work-dir/warehouse/
pokemon pokemon1
Running Beeline outside the container as another user
The user may use any client program to connect to HiveServer2.
In our example, we use the script included in the MR3 release (hive/run-beeline.sh
).
Open env.sh
and set environment variables HIVE3_SERVER2_HOST
, HIVE3_SERVER2_PORT
, SECURE_MODE
, and HIVE_SERVER2_AUTHENTICATION
.
Note that we do not use Kerberos for connecting to HiveServer2.
$ grep -e HIVE3_SERVER2_HOST -e SECURE_MODE -e HIVE_SERVER2_AUTHENTICATION -e HIVE3_SERVER2_PORT env.sh
HIVE3_SERVER2_HOST=your.server.address
HIVE3_SERVER2_PORT=9852
SECURE_MODE=false
HIVE_SERVER2_AUTHENTICATION=NONE
Run Beeline using --tpcds
and --hivesrc3
options.
$ ./hive/run-beeline.sh --tpcds --hivesrc3
...
beeline -u "jdbc:hive2://your.server.address:9852/;;" -n gla -p gla --hiveconf hive.querylog.location=/data1/gla/mr3-run-1.0/hive/run-beeline-result/hive-mr3-5ba3d48-2020-10-28-17-02-06-d0064f30/hive-logs
...
Connecting to jdbc:hive2://your.server.address:9852/;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://your.server.address:9852/>
Note that in our example, we connect to HiveServer2 as user gla
, not hive
.
Now execute a sample query.
0: jdbc:hive2://your.server.address:9852/> use default;
0: jdbc:hive2://your.server.address:9852/> select count(*) from pokemon;
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [gla] does not have [SELECT] privilege on [default/pokemon] (state=42000,code=40000)
The query fails because Ranger is not configured to allow user gla
to access the table pokemon
.
Updating the Ranger policy
In order to update the Ranger policy, log on to Ranger (using ID admin
and password rangeradmin1
) at http://your.server.address:6080/
.
The user can find the Ranger service DOCKER_hive
which has been created by the initialization script.
Open the Settings
/Users
tab.
Create a new user, e.g., gla
.
Then open the Ranger service DOCKER_hive
and choose an appropriate policy ID, e.g., all - database, table, colume
.
Add user gla
to the list Select User
.
After a while, HiveServer2 reloads the new policy from Ranger.
Then user gla
can run the previous query.
0: jdbc:hive2://your.server.address:9852/> select count(*) from pokemon;
+------+
| _c0 |
+------+
| 807 |
+------+
1 row selected (0.675 seconds)
MR3-UI
Download mr3-ui-1.0.tar.gz
and extract it to a MR3-UI directory
that can be accessed by a web server (e.g., at your.webserver.address
).
$ pwd
/var/www/html
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/mr3-ui-1.0.tar.gz
$ gunzip -c mr3-ui-1.0.tar.gz | tar xvf -
$ cd mr3-ui-0.8.5/
Since MR3-UI is a JavaScript application running on the client side, the user may install it on any machine where a web server is ready.
For example, it is okay to install MR3-UI on the local machine.
Then update config/configs.env
in the MR3-UI directory to specify the addresses of the Timeline Server, as illustrated in the following example:
$ vi config/configs.env
hosts: {
/*
* Timeline Server Address:
* By default MR3 UI looks for timeline server at http://localhost:8188, uncomment
* and change the following value for pointing to a different address.
*/
timeline: "your.server.address:8188",
}
Then the user can visit the MR3-UI address specified by its installation directory
(e.g., http://your.webserver.address/mr3-ui-0.8.5/
).
Changing configurations for HiveServer2
In order to change configurations for HiveServer2,
start an all-in-one container with an initial command /usr/sbin/init
.
$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 6080:6080 -p 8188:8188 -p 9852:9852 mr3project/hivemr3-all:1.2 /usr/sbin/init
c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b
Then update /opt/mr3-run/env.sh
and configuration files under the directory /opt/mr3-run/conf
as necessary.
$ docker exec -it c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b /bin/bash
[root@c85a7579e509 mr3-run]# vi env.sh
[root@c85a7579e509 mr3-run]# vi conf/mr3-site.xml
[root@c85a7579e509 mr3-run]# vi conf/hive-site.xml
Finally run the initialization script run-all.sh
.
[root@c85a7579e509 mr3-run]# ./run-all.sh