The user of Hive on MR3 can use HPL/SQL with the script hive/run-hplsql.sh
included in the MR3 release.
Below we demonstrate the use of HPL/SQL in various scenarios.
We assume that the user executes hive/run-hplsql.sh
on the same node where HiveServer2 runs.
Setting up HPL/SQL
Since HPL/SQL uses user defined functions (UDFs), the configuration key mr3.am.permit.custom.user.class
should be set to true in mr3-site.xml
.
$ vi kubernetes/conf/mr3-site.xml
<property>
<name>mr3.am.permit.custom.user.class</name>
<value>true</value>
</property>
The user can create a new configuration file hplsql-site.xml
in a configuration directory for Hive.
In our example, we configure HPL/SQL so that all commands are sent to HiveServer2.
$ vi kubernetes/conf/hplsql-site.xml
<configuration>
<property>
<name>hplsql.conn.default</name>
<value>hive2conn</value>
</property>
<property>
<name>hplsql.conn.hive2conn</name>
<value>org.apache.hive.jdbc.HiveDriver;jdbc:hive2://indigo1:9832;gitlab-runner;gitlab-runner</value>
</property>
</configuration>
Then start HiveServer2 by executing the script hive/hiveserver2-service.sh
.
Running HPL/SQL commands
We can execute HPL/SQL commands directly with the script run-hplsql.sh
.
$ ./hive/run-hplsql.sh -d a=Hello -d b=HiveMR3 -e "PRINT a || ', ' || b"
...
Hello, HiveMR3
We can send SQL statements to HiveServer2 with the script run-hplsql.sh
.
$ ./hive/run-hplsql.sh --tpcds -e "use tpcds_bin_partitioned_orc_2; select count(*) from call_center,catalog_page;"
...
Open connection: jdbc:hive2://indigo1:9832 (339 ms)
Starting SQL statement
SQL statement executed successfully (263 ms)
Starting query
Query executed successfully (1.42 sec)
93744
The following example does not work because everything after -e
is taken as a query.
$ ./hive/run-hplsql.sh -e "PRINT a || ', ' || b" -d a=Hello -d b=Hivemr3
...
null
Running HPL/SQL scripts which declare UDFs
The following HPL/SQL script declares two UDFs (hello
and set_message
).
The UDFs are called 1) from HPL/SQL commands and 2) from SQL statements sent to HiveServer2.
CREATE FUNCTION hello(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
SET result = 'Hello, ' || name || '!';
END;
PRINT hello('world');
DECLARE str STRING;
CALL set_message('world', str);
PRINT str;
use tpcds_bin_partitioned_orc_2;
SELECT hello(cc_city) FROM call_center;
SELECT hello(cc_city) FROM call_center, store WHERE hello(cc_city) = hello(s_city) limit 10;
Executing the script produces the following result.
$ ./hive/run-hplsql.sh --tpcds --hivesrc3 -f ~/mr3-run/hive/sample.sql
...
Hello, world!
Hello, world!
Open connection: jdbc:hive2://indigo1:9832 (308 ms)
Starting SQL statement
SQL statement executed successfully (261 ms)
Starting pre-SQL statement
Starting pre-SQL statement
Starting pre-SQL statement
Starting pre-SQL statement
Starting pre-SQL statement
Starting query
Query executed successfully (368 ms)
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Fairview!
Starting query
Query executed successfully (1.36 sec)
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Using hplsqlrc
The user can create a new file hplsqlrc
in a configuration directory for Hive (e.g., conf/tpcds/hive3/hplsqlrc
).
In our example, the file declares two UDFs.
CREATE FUNCTION hello(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
SET result = 'Hello, ' || name || '!';
END;
We create an HPL/SQL script which calls UDFs declared in the file hplsqlrc
.
PRINT hello('world');
DECLARE str STRING;
CALL set_message('world', str);
PRINT str;
Executing the script produces the following result.
$ ./hive/run-hplsql.sh --tpcds --hivesrc3 -f ~/mr3-run/hive/sample2.sql
...
Hello, world!
Hello, world!
Calling UDFs declared in hplsqlrc
from SQL statements sent to HiveServer2
In order to call UDFs declared in hplsqlrc
from SQL statements sent to HiveServer2,
the user should copy the file hplsqlrc
to the library directory of Hive (e.g., hive/hivejar/apache-hive-3.1.3-bin/lib/
).
(This additional step is specific to the current implementation of MR3.)
$ cat ./hive/hivejar/apache-hive-3.1.3-bin/lib/hplsqlrc
CREATE FUNCTION hello(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
SET result = 'Hello, ' || name || '!';
END;
We create an HPL/SQL script which sends SQL statements calling UDFs declared in hplsqlrc
.
use tpcds_bin_partitioned_orc_2;
SELECT hello(cc_city) FROM call_center;
SELECT hello(cc_city) FROM call_center, store WHERE hello(cc_city) = hello(s_city) limit 10;
Executing the script produces the following result.
$ ./hive/run-hplsql.sh --tpcds --hivesrc3 -f ~/mr3-run/hive/sample3.sql
...
Open connection: jdbc:hive2://indigo1:9832 (343 ms)
Starting SQL statement
SQL statement executed successfully (264 ms)
Starting pre-SQL statement
Starting pre-SQL statement
Starting pre-SQL statement
Starting pre-SQL statement
Starting pre-SQL statement
Starting query
Query executed successfully (377 ms)
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Fairview!
Starting query
Query executed successfully (1.44 sec)
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Fairview!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Hello, Midway!
Calling UDFs declared in hplsqlrc
from Beeline
In order to call UDFs declared in hplsqlrc
from Beeline, the user should first register two jar files and the file hplsqlrc
itself, and then create a function hplsql
.
0: jdbc:hive2://indigo1:9832/> add jar /home/gitlab-runner/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/hive-hplsql-3.1.3.jar;
0: jdbc:hive2://indigo1:9832/> add jar /home/gitlab-runner/mr3-run/hive/hivejar/apache-hive-3.1.3-bin/lib/antlr4-runtime-4.5.jar;
0: jdbc:hive2://indigo1:9832/> add file /home/gitlab-runner/mr3-run/conf/tpcds/hive3/hplsqlrc;
0: jdbc:hive2://indigo1:9832/> CREATE TEMPORARY FUNCTION hplsql AS 'org.apache.hive.hplsql.Udf';
Then the user can call UDFs using the following syntax.
0: jdbc:hive2://indigo1:9832/> use tpcds_bin_partitioned_orc_2;
0: jdbc:hive2://indigo1:9832/> SELECT hplsql('hello(:1)', cc_city) FROM call_center;
0: jdbc:hive2://indigo1:9832/> SELECT hplsql('hello(:1)', cc_city) FROM call_center, store where hplsql('hello(:1)', cc_city) = hplsql('hello(:1)', s_city) limit 10;