Since it is agnostic to the type of data sources,
Hive on MR3 can access multiple data sources simultaneously (e.g. by joining tables from two separate Hadoop clusters).
The only restriction is that it must use a single KDC and a single KMS, if it uses them at all.
Below we illustrate how to use a nonsecure HDFS as another remote data source in addition to an existing secure HDFS, as depicted in the following diagram.
We assume that the secure HDFS runs on
red0 and the nonsecure HDFS runs on
As the first step, we allow Hive to read from secure HDFS and nonsecure HDFS by setting the configuration key
<property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>ipc.client.fallback-to-simple-auth-allowed</name> <value>true</value> </property>
Usually it is impersonation issues that prevent access to nonsecure HDFS.
HIVE_SERVER2_KERBEROS_PRINCIPAL is set to
mr3-run/kubernetes/env.sh, creating an external table from nonsecure HDFS may generate an error message shown below.
2019-07-23 14:33:49,950 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from 10.1.91.38:57090 for protocol org.apache.hadoop.hdfs.protocol.ClientProtocol is unauthorized for user gitlab-runner (auth:PROXY) via hive/red0@RED (auth:SIMPLE) 2019-07-23 14:33:49,951 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client 10.1.91.38 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: hive/red0@RED is not allowed to impersonate gitlab-runner]
Here an ordinary user
gitlab-runner runs Beeline and tries to create an external table from a directory on HDFS running on
As indicated by the error message, NameNode on
gold0 should allow
hive/red0@RED to impersonate
This requires two changes in
- The configuration key
hadoop.proxyuser.hive.usersshould be set to
gold0(where the nonsecure HDFS runs).
- The configuration key
hadoop.security.auth_to_localshould be set so that user
hive/red0@REDcan be mapped to user
hivein auth_to_local rules, as shown in the following example.
RULE:[2:$1@$0](hive@RED)s/.*/hive/ RULE:[1:$1@$0](hive@RED)s/.*/hive/ DEFAULT
Then the user (or the administrator user of
gold0) should restart NameNode, and the impersonation issue should disappear.
Note that the impersonation issue arises because of accessing nonsecure HDFS and has nothing to do with the value of the configuration key
That is, even if
hive.server2.enable.doAs is set to false, the user may still see the impersonation issue.
Now the user with proper permission can create an external table from the nonsecure HDFS.
0: jdbc:hive2://10.1.91.41:9852/> create external table call_center_gold( . . . . . . . . . . . . . . . . > cc_call_center_sk bigint ... . . . . . . . . . . . . . . . . > , cc_tax_percentage double . . . . . . . . . . . . . . . . > ) . . . . . . . . . . . . . . . . > stored as orc . . . . . . . . . . . . . . . . > location 'hdfs://gold0:8020/tmp/hivemr3/warehouse/tpcds_bin_partitioned_orc_2.db/call_center'; ... INFO : OK No rows affected (0.256 seconds)