This page shows how to run Hive 3 on MR3, Apache Ranger 2.1.0, and Timeline Server in a single Docker container using a pre-built all-in-one Docker image mr3project/hivemr3-all:1.2 available at DockerHub.

By following the instruction, the user will learn:

  1. how to run Hive on MR3, Apache Ranger 2.1.0, and Timline Server in a single Docker container
  2. how to create Beeline connections and send queries to HiveServer2
  3. how to add a new user in a Ranger policy
  4. how to change configurations for HiveServer2

For running Beeline outside the container, we assume that a client program to connect to HiveServer2 is already installed. This scenario should take less than 30 minutes to complete.

This page has been tested with MR3 release 1.2.

Starting the all-in-one container

In order to start an all-in-one container, execute the following command in which we specify mr3project/hivemr3-all:1.2 as the name of the Docker image.

$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 6080:6080 -p 8188:8188 -p 9852:9852 mr3project/hivemr3-all:1.2
fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b

The initialization script installs MySQL 5.6 and downloads mysql-connector-java-8.0.17.tar.gz for a MySQL connector. It creates the following accounts for MySQL, Ranger, and Solr:

  • root:passwd for MySQL
  • admin:rangeradmin1 for Ranger
  • solr:solrRock for Solr (used internally by Ranger)

It uses the following ports for HiveServer2, Ranger, Solr, and Timeline Server:

  • 9852 for HiveServer2
  • 6080 for Ranger (HTTP)
  • 6083 for Solr (HTTP, used internally by Ranger)
  • 8188 for Timeline Server (HTTP)

The user can inspect the log file run.log inside the container to check if all services have started properly.

$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# tail -f run.log 
...
starting Ranger and Solr
...
Server: Apache Ranger
...
starting ATS
1dc82548-40ac-4973-8092-9f4c96b56cac
starting Metastore and HiveServer2

Running Beeline inside the container as user hive

Inside the container, download a sample dataset and execute /opt/mr3-run/hive/run-beeline.sh as user hive.

$ docker exec -it fdf73c08660a6d3b846f7d841eaba57d85d2b597a5112adb984ae9477331029b /bin/bash
[root@fdf73c08660a mr3-run]# su hive
bash-4.2$ pwd
/opt/mr3-run
bash-4.2$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/pokemon.csv

bash-4.2$ hive/run-beeline.sh
...
Connecting to jdbc:hive2://fdf73c08660a:9852/;;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive

Create a table called pokemon and import the sample dataset.

0: jdbc:hive2://fdf73c08660a:9852/> CREATE TABLE pokemon (Number Int,Name String,Type1 String,Type2 String,Total Int,HP Int,Attack Int,Defense Int,Sp_Atk Int,Sp_Def Int,Speed Int) row format delimited fields terminated BY ',' lines terminated BY '\n' tblproperties("skip.header.line.count"="1");

0: jdbc:hive2://fdf73c08660a:9852/> load data local inpath '/opt/mr3-run/pokemon.csv' INTO table pokemon;

Execute queries.

0: jdbc:hive2://fdf73c08660a:9852/> select avg(HP) from pokemon;

0: jdbc:hive2://fdf73c08660a:9852/> create table pokemon1 as select *, IF(HP>160.0,'strong',IF(HP>140.0,'moderate','weak')) AS power_rate from pokemon;

0: jdbc:hive2://fdf73c08660a:9852/> select COUNT(name), power_rate from pokemon1 group by power_rate;
...
+------+-------------+
| _c0  | power_rate  |
+------+-------------+
| 363  | strong      |
| 336  | weak        |
| 108  | moderate    |
+------+-------------+
3 rows selected (1.265 seconds)

The user can find the warehouse directory /opt/mr3-run/work-dir/warehouse/.

bash-4.2$ ls work-dir/warehouse/
pokemon  pokemon1

Running Beeline outside the container as another user

The user may use any client program to connect to HiveServer2. In our example, we use the script included in the MR3 release (hive/run-beeline.sh).

Open env.sh and set environment variables HIVE3_SERVER2_HOST, HIVE3_SERVER2_PORT, SECURE_MODE, and HIVE_SERVER2_AUTHENTICATION. Note that we do not use Kerberos for connecting to HiveServer2.

$ grep -e HIVE3_SERVER2_HOST -e SECURE_MODE -e HIVE_SERVER2_AUTHENTICATION -e HIVE3_SERVER2_PORT env.sh
HIVE3_SERVER2_HOST=your.server.address
HIVE3_SERVER2_PORT=9852
SECURE_MODE=false
HIVE_SERVER2_AUTHENTICATION=NONE

Run Beeline using --tpcds and --hivesrc3 options.

$ ./hive/run-beeline.sh --tpcds --hivesrc3
...
beeline -u "jdbc:hive2://your.server.address:9852/;;" -n gla -p gla --hiveconf hive.querylog.location=/data1/gla/mr3-run-1.0/hive/run-beeline-result/hive-mr3-5ba3d48-2020-10-28-17-02-06-d0064f30/hive-logs
...
Connecting to jdbc:hive2://your.server.address:9852/;;
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://your.server.address:9852/>

Note that in our example, we connect to HiveServer2 as user gla, not hive.

Now execute a sample query.

0: jdbc:hive2://your.server.address:9852/> use default;
0: jdbc:hive2://your.server.address:9852/> select count(*) from pokemon;
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [gla] does not have [SELECT] privilege on [default/pokemon] (state=42000,code=40000)

The query fails because Ranger is not configured to allow user gla to access the table pokemon.

Updating the Ranger policy

In order to update the Ranger policy, log on to Ranger (using ID admin and password rangeradmin1) at http://your.server.address:6080/. The user can find the Ranger service DOCKER_hive which has been created by the initialization script.

docker-ranger-1

Open the Settings/Users tab.

docker-ranger-2

Create a new user, e.g., gla.

docker-ranger-3

Then open the Ranger service DOCKER_hive and choose an appropriate policy ID, e.g., all - database, table, colume.

docker-ranger-4

Add user gla to the list Select User.

docker-ranger-5

After a while, HiveServer2 reloads the new policy from Ranger. Then user gla can run the previous query.

0: jdbc:hive2://your.server.address:9852/> select count(*) from pokemon;
+------+
| _c0  |
+------+
| 807  |
+------+
1 row selected (0.675 seconds)

MR3-UI

Download mr3-ui-1.0.tar.gz and extract it to a MR3-UI directory that can be accessed by a web server (e.g., at your.webserver.address).

$ pwd
/var/www/html
$ wget https://github.com/mr3project/mr3-release/releases/download/v1.0/mr3-ui-1.0.tar.gz
$ gunzip -c mr3-ui-1.0.tar.gz | tar xvf -
$ cd mr3-ui-0.8.5/

Since MR3-UI is a JavaScript application running on the client side, the user may install it on any machine where a web server is ready. For example, it is okay to install MR3-UI on the local machine. Then update config/configs.env in the MR3-UI directory to specify the addresses of the Timeline Server, as illustrated in the following example:

$ vi config/configs.env

hosts: {
  /*
  * Timeline Server Address:
  * By default MR3 UI looks for timeline server at http://localhost:8188, uncomment
  * and change the following value for pointing to a different address.
  */
  timeline: "your.server.address:8188",
}

Then the user can visit the MR3-UI address specified by its installation directory (e.g., http://your.webserver.address/mr3-ui-0.8.5/).

docker-mr3ui

Changing configurations for HiveServer2

In order to change configurations for HiveServer2, start an all-in-one container with an initial command /usr/sbin/init.

$ sudo docker run --privileged -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro -p 6080:6080 -p 8188:8188 -p 9852:9852 mr3project/hivemr3-all:1.2 /usr/sbin/init
c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b

Then update /opt/mr3-run/env.sh and configuration files under the directory /opt/mr3-run/conf as necessary.

$ docker exec -it c85a7579e5092e4cdfe12b1dfdc5d78178c76f2b92d4d5657cacdf911962f83b /bin/bash
[root@c85a7579e509 mr3-run]# vi env.sh
[root@c85a7579e509 mr3-run]# vi conf/mr3-site.xml
[root@c85a7579e509 mr3-run]# vi conf/hive-site.xml

Finally run the initialization script run-all.sh.

[root@c85a7579e509 mr3-run]# ./run-all.sh