This page explains additional steps for using Ranger for authorization in Hive on MR3. Using Ranger for authorization has the following prerequisite:
- A database server for Ranger is running. It may be the same database server for Metastore.
To run Ranger, we need to check or update the following files:
├── helm
│ └── ranger
│ └── values.yaml
├── conf
│ ├── hive-site.xml
│ ├── ranger-hive-audit.xml
│ └── ranger-hive-security.xml
├── ranger-key
│ ├── install.properties
│ └── solr.in.sh
└── ranger-conf
├── core-site.xml
├── solr-security.json
└── ranger-admin-site.xml.append
The file helm/ranger/values.yaml
defines the default values for the Helm chart.
Typically the user creates another YAML file to override some of these default values.
In our example, we create a new YAML file values-ranger.yaml
.
Basic settings
Open values-ranger.yaml
and set the following fields.
$ vi values-ranger.yaml
docker:
image: mr3project/ranger:2.4.0
ranger:
externalIp: 192.168.10.1
hostAliases:
- ip: "192.168.10.100"
hostnames:
- "orange0"
- ip: "192.168.10.1"
hostnames:
- "orange1"
docker/image
specifies the full name of the Docker image including a tag. We use the pre-built Docker imagemr3project/ranger:2.4.0
.ranger/externalIp
specifies the host for the Service for exposing Ranger to the outside of the Kubernetes cluster. The user should specify an IP address with a valid host name.hostAliases
lists aliases for hosts that may not be found in the default DNS. Usually it suffices to include three hosts: 1) the host running MySQL for Ranger outside the Kubernetes cluster; 2) the host running HiveServer2 inside the Kubernetes cluster; 3) the host running Ranger inside the Kubernetes cluster. In our example,orange0
is the host running MySQL for Ranger andorange1
is the host name assigned to HiveServer2 and Ranger.
PersistentVolume for Ranger
We need a PersistentVolume for storing data for Ranger.
The user should update values-ranger.yaml
to use a desired type of PersistentVolume.
In our example, we create a PersistentVolume using NFS.
The PersistentVolume should be writable to user nobody
(corresponding to root user).
Open values-ranger.yaml
and set the following fields.
$ vi values-ranger.yaml
workDir:
isNfs: true
nfs:
server: "192.168.10.1"
path: "/home/nfs/hivemr3"
volumeSize: 10Gi
volumeClaimSize: 10Gi
storageClassName: ""
volumeStr:
workDir/isNfs
specifies whether the PersistentVolume uses NFS or not.workDir/nfs/server
andworkDir/nfs/path
specify the address of the NFS server and the path exported by the NFS server (whenworkDir/isNfs
is set to true).workDir/volumeSize
andworkDir/volumeClaimSize
specify the size of the PersistentVolume and the PersistentVolumeClaim.workDir/storageClassName
specifies the StorageClass of the PersistentVolume.workDir/volumeStr
specifies the PersistentVolume to use whenworkDir/isNfs
is set to false. For example,volumeStr: "hostPath:\n path: /work/nfs/mr3-run-work-dir"
creates a hostPath PersistentVolume.
ranger-key/install.properties
$ vi ranger-key/install.properties
DB_FLAVOR=MYSQL
SQL_CONNECTOR_JAR=/opt/mr3-run/lib/mysql-connector-java-8.0.28.jar
db_root_user=root
db_root_password=passwd
db_host=192.168.10.100
db_password=password
audit_solr_urls=http://orange1:6083/solr/ranger_audits
audit_solr_user=
audit_solr_password=
policymgr_external_url=http://orange1:6080
policymgr_http_enabled=true
DB_FLAVOR
andSQL_CONNECTOR_JAR
should match the database connector jar file. When using a MySQL server, Ranger automatically downloads a MySQL connector fromhttps://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.28.tar.gz
. The user should check the compatibility between the server and the connector. For example, a MySQL server created with a Docker image5.7.37-0ubuntu0.18.04.1
is not fully compatible.db_root_user
anddb_root_password
should be set to the ID and password of the root user of MySQL for Ranger.db_host
should be set to the IP address or the host name of MySQL for Ranger (or any database supported by Ranger).db_password
specifies a password for the userrangeradmin
.audit_solr_urls
specifies the address for the configuration keyranger.audit.solr.urls
.- Set to empty or remove two variables
audit_solr_user
andaudit_solr_password
related to authentication for auditing. policymgr_external_url
should be set to the Ranger admin URL.policymgr_http_enabled
should be set to true.
ranger-key/solr.in.sh
$ vi ranger-key/solr.in.sh
SOLR_SSL_ENABLED=false
SOLR_AUTH_TYPE="basic"
SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:solrRocks"
- Set
SOLR_SSL_ENABLED
to false because we do not use SSL for Solr. - Set
SOLR_AUTH_TYPE
andSOLR_AUTHENTICATION_OPTS
as shown above because we do not use Kerberos for Solr.
ranger-conf/core-site.xml
Set the configuration key hadoop.security.authentication
to simple
to disable Kerberos authentication.
$ vi ranger-conf/core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
ranger-conf/solr-security.json
Set the configuration for authentication and authorization in Solr as follows:
$ vi ranger-conf/solr-security.json
{
"authentication": {
"blockUnknown": false,
"class": "solr.BasicAuthPlugin",
"credentials":{
"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
}
},
"authorization": {
"class": "solr.RuleBasedAuthorizationPlugin"
}
}
conf/hive-site.xml
The following configuration keys should be set to use Ranger for authorization in HiveServer2:
$ vi conf/hive-site.xml
<property>
<name>hive.security.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value>
</property>
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory</value>
</property>
conf/ranger-hive-audit.xml
The configuration key xasecure.audit.destination.solr.urls
should use the host name assigned to Ranger.
$ vi conf/ranger-hive-audit.xml
<property>
<name>xasecure.audit.destination.solr.urls</name>
<value>http://orange1:6083/solr/ranger_audits</value>
</property>
conf/ranger-hive-security.xml
The configuration key ranger.plugin.hive.service.name
should use the Ranger service for HiveServer2.
The configuration key ranger.plugin.hive.policy.rest.url
should use the host name assigned to Ranger.
$ vi conf/ranger-hive-security.xml
<property>
<name>ranger.plugin.hive.service.name</name>
<value>ORANGE_hive</value>
</property>
<property>
<name>ranger.plugin.hive.policy.rest.url</name>
<value>http://orange1:6080</value>
</property>
Running Ranger
Assuming that
a new YAML file values-ranger.yaml
overrides the default values in helm/ranger/values.yaml
,
the user can run Ranger with namespace hivemr3
as follows:
$ ln -s $(pwd)/ranger-conf/ helm/ranger/conf
$ ln -s $(pwd)/ranger-key/ helm/ranger/key
$ helm install --namespace hivemr3 helm/ranger -f values-ranger.yaml
Here the first two commands create symbolic links so that Helm can access
the directories ranger-conf
and ranger-key
directly.
Then the user can execute Metastore and HiveServer2.
Creating a Ranger service
After running Ranger, the user can check if Ranger has started properly.
$ kubectl logs -n hivemr3 hivemr3-ranger-856fc4dff-6rrvg
...
Installation of Ranger PolicyManager Web Application is completed.
Starting Apache Ranger Admin Service
Apache Ranger Admin Service with pid 1643 has started.
Before executing queries,
the user should create a new Ranger service ORANGE_hive
(if it is not available yet).
The user can access Ranger Admin UI at http://orange1:6080
(specified by policymgr_external_url
in ranger-key/install.properties
).
Login to Ranger Admin UI with user admin
and password rangeradmin1
.
Create a Ranger service ORANGE_hive
.
In Config Properties
,
fill the JDBC URL field with:
jdbc:hive2://orange1:9852/
policy.download.auth.users
should be set to the user hive
, or the owner of HiveServer2.
Then Ranger can inspect metadata (such as databases, tables, users) managed by HiveServer2
while HiveServer2 can retrieve its Ranger service profile.
While creating the Ranger service,
the Test Connection
button fails because HiveServer2 is unaware of it.
After creating the Ranger service, the button should work.
After creating the Ranger service,
HiveServer2 successfully downloads the policy for ORANGE_hive
.
$ kubectl logs -n hivemr3 hivemr3-hiveserver2-b6889c9d-6nl8z | grep ORANGE_hive
...
2023-02-08T08:13:10,435 INFO [PolicyRefresher(serviceName=ORANGE_hive)-24] policyengine.RangerPolicyRepository: This policy engine contains 8 policy evaluators
As the last step before executing queries,
new users should be added to the Ranger policy.
For example, we can add a new user superset
to allow Superset.