By running Ranger, we complete a Kubernetes cluster which supports data security and is thus ready for production use:

hive.k8s.hdfs.overview

We can also run Ranger as a separate application (independently of Hive on MR3). For such a use case, download an MR3 release containing the executable scripts.

$ git clone https://github.com/mr3project/mr3-run-k8s.git
$ cd mr3-run-k8s/

Set the environment variable HIVE_MYSQL_DRIVER in env.sh to a MySQL connector jar file compatible with the MySQL database for Ranger.

$ vi env.sh

HIVE_MYSQL_DRIVER=/usr/share/java/mysql-connector-java.jar

Building a Docker image for Ranger

To build a Docker image for Ranger, collect all necessary files in the directory kubernetes/ranger by executing the script build-k8s-ranger.sh.

$ ./build-k8s-ranger.sh
downloading Solr at /tmp/solr-7.7.2.tgz
...
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36:18 min
[INFO] Finished at: 2020-10-04T20:14:21+09:00
[INFO] ------------------------------------------------------------------------

build-k8s-ranger.sh downloads a binary distribution of Apache Solr (version 7.7.2 by default) and clones a git repository of Ranger (version 2.1.0 by default) under the directory /tmp/ranger. Then it compiles Ranger and populates two directories: kubernetes/ranger for building a Docker image for Ranger and kubernetes/hive for installing the Hive plugin for Ranger.

For building Ranger 2.1.0, the user should use Maven 3.6.2 and Python 3 with the requests library. If build-k8s-ranger.sh fails, check the variable SOLR_DOWNLOAD_URL (which specifies the address of the binary distribution of Apache Solr) in the function build_k8s_ranger of common-build-setup.sh. The user can also change the version of Ranger and the directory $RANGER_HOME where Ranger is built. If build-k8s-ranger.sh fails at the last step when building a Ranger distribution, manually execute the Maven command in the directory $RANGER_HOME.

function build_k8s_ranger {
...
  TMP_DIR=${RANGER_TMP_DIR:-/tmp}
...
  SOLR_DOWNLOAD_URL=https://archive.apache.org/dist/lucene/solr/7.7.2/solr-7.7.2.tgz
...
  RANGER_HOME=$TMP_DIR/ranger
...
  RANGER_GIT_REV=5ec9fbd0b78595084dc847f7fdc9da0506f6c482
  RANGER_VERSION=2.1.0

Similarly to building a Docker image for running Hive on MR3, the user should set an environment variable DOCKER_RANGER_IMG in kubernetes/env.sh:

DOCKER_RANGER_IMG=10.1.91.17:5000/ranger:latest
  • DOCKER_RANGER_IMG specifies the full name of the Docker image (including a tag) for running Ranger which may include the address of a running Docker server.

The last step is to build a Docker image from Dockerfile in the directory kubernetes/ranger by executing kubernetes/build-ranger.sh. The script builds a Docker image (which contains everything for running Ranger) and registers it to the Docker server specified in kubernetes/env.sh. If successful, the user can pull the Docker image on another node:

$ docker pull 10.1.91.17:5000/ranger
Using default tag: latest
Trying to pull repository 10.1.91.17:5000/ranger ... 
latest: Pulling from 10.1.91.17:5000/ranger
...
Status: Downloaded newer image for 10.1.91.17:5000/ranger:latest

Note that after building the Docker image for Ranger, the user should rebuild the Docker image for running Hive on MR3 because HiveServer2 needs the Hive plugin for Ranger.

The user can run Ranger with or without Helm. To run Ranger with Helm, see Running Apache Ranger as well. Below we describe how to run Ranger without Helm.

Configuring the Pod for Ranger

The following files specify how to configure Kubernetes objects for Ranger:

└── kubernetes
    ├── env.sh
    ├── ranger-key
    └── yaml
        ├── namespace.yaml
        ├── ranger-service.yaml
        ├── ranger.yaml
        ├── workdir-pvc-ranger.yaml
        └── workdir-pv-ranger.yaml

We assume that Ranger belongs to the same namespace as HiveServer2, and reuse namespace.yaml. Ranger uses workdir-pvc-ranger.yaml and workdir-pv-ranger.yaml which can be configured similarly to workdir-pvc.yaml and workdir-pv.yaml. The PersistentVolume should be writable to the user with UID 1000 (which is specified in kubernetes/ranger/Dockerfile).

The user should set the following environment variable in kubernetes/env.sh.

CREATE_RANGER_SECRET=true
  • CREATE_RANGER_SECRET specifies whether or not to create a Secret from keytab files in the directory kubernetes/ranger-key. Usually it should be set to true whether Kerberos is used for authentication or not (because of kubernetes/ranger-key/install.properties).

ranger-service.yaml

This file creates a Service for exposing Ranger to the outside of the Kubernetes cluster. The user should specify an IP address with a valid host name and three port numbers for Ranger so that both the administrator user from the outside and HiveServer2 from the inside can connect to it using the host name. Usually there is no need to change the three targetPort fields which specify port numbers internal to the Ranger Pod.

  ports:
  - name: ranger-admin-http
    protocol: TCP
    port: 6080
    targetPort: 6080
  - name: ranger-admin-https
    protocol: TCP
    port: 6182
    targetPort: 6182
  - name: solr
    protocol: TCP
    port: 6083
    targetPort: 6083
  externalIPs:
  - 10.1.91.41

The sample file in the MR3 release uses 10.1.91.41:6080 as the HTTP address and 10.1.91.41:6182 as the HTTPS address of Ranger. Another address 10.1.91.41:6083 is reserved for the internal communication between Ranger and Solr. The user should make sure that the IP address exists and is not already taken.

ranger.yaml

This file creates a Pod for running Ranger. Internally the Pod runs two containers (for Ranger itself and for Solr) in parallel. The user should update the spec/hostAliases field and the spec/containers section.

  • The spec/hostAliases field lists aliases for hosts that may not be found in the default DNS. Usually it suffices to include three hosts: 1) the host running MySQL for Ranger outside the Kubernetes cluster; 2) the host running HiveServer2 inside the Kubernetes cluster; 3) the host running Ranger inside the Kubernetes cluster. In the sample file in the MR3 release, indigo0 is the host running MySQL for Ranger and indigo20 is the host name assigned to HiveServer2 and Ranger.
      hostAliases:
      - ip: "10.1.91.17"
        hostnames:
        - "indigo0"
      - ip: "10.1.91.41"
        hostnames:
        - "indigo20"
    
  • The image field in the spec/containers section should match the Docker image specified by DOCKER_RANGER_IMG in kubernetes/env.sh.
  • The resources/requests and resources/limits specify the resources to to be allocated to the Ranger container and the Solr container.
  • The ports/containerPort fields should match the port numbers specified in the targetPort fields in ranger-service.yaml.
spec:
  containers:
  - image: 10.1.91.17:5000/ranger
    name: solr
    resources:
      requests:
        cpu: 1
        memory: 4Gi
      limits:
        cpu: 1
        memory: 4Gi
    ports:
    - containerPort: 6083
      protocol: TCP

  - image: 10.1.91.17:5000/ranger
    name: ranger
    resources:
      requests:
        cpu: 1
        memory: 4Gi
      limits:
        cpu: 1
        memory: 4Gi
    ports:
    - containerPort: 6080
      protocol: TCP
    - containerPort: 6182
      protocol: TCP

Configuring Ranger

The following files specify how to configure Ranger:

└── kubernetes
    ├── env.sh
    ├── ranger-conf
       ├── core-site.xml
       ├── krb5.conf
       ├── ranger-admin-site.xml
       ├── ranger-log4j.properties
       ├── solr-core.properties
       ├── solr-elevate.xml
       ├── solr.in.sh
       ├── solr-log4j2.xml
       ├── solr-managed-schema
       ├── solr-security.json
       ├── solr-solrconfig.xml
       └── solr-solr.xml
    └── ranger-key
        └── install.properties

Because of lack of detailed documentation on configuring Ranger, the user is advised to run Ranger initially with minimal changes to the configuration files in the MR3 release (which should work okay in a typical Kubernetes cluster). After getting Ranger up and running, the user can incrementally adjust the configuration to suit particular needs. Otherwise the user might have to fix the configuration by reading the source code of Ranger.

We recommend the user to create three Kerberos keytab files (with service principals) with the following names. In our example, we assume that indigo20 is the host name assigned to Ranger and that RED is the Kerberos realm. The user should copy the keytab files in the directory kubernetes/ranger-key.

  • rangeradmin.keytab with service principal rangeradmin/indigo20@RED
  • spnego.service.keytab with service principal HTTP/indigo20@RED
  • rangerlookup.keytab with service principal rangerlookup/indigo20@RED

Below we describe those sections that are specific to each Kubernetes cluster. To enable SSL, see Enabling SSL. For more details, we refer the reader to the documentation on Ranger.

install.properties

  • SQL_CONNECTOR_JAR should match the MySQL connector jar file from HIVE_MYSQL_DRIVER in env.sh.

    SQL_CONNECTOR_JAR=/opt/mr3-run/lib/mysql-connector-java.jar
    

    If the Docker image for Ranger does not contain a MySQL connector or a different MySQL connector should be used, the user can copy a MySQL connector jar file to a subdirectory of the PersistentVolume and set SQL_CONNECTOR_JAR to point to the file (e.g., SQL_CONNECTOR_JAR=/opt/mr3-run/ranger/work-dir/lib/mysql-connector-java-8.0.12.jar). In this way, Ranger can use the custom MySQL connector provided by the user. If the Docker image for Ranger already contains a MySQL connector, the PersistentVolume is not used.

  • db_root_user and db_root_password should be set to the ID and password of the root user of MySQL for Ranger.

    db_root_user=root
    db_root_password=passwd
    
  • db_host should be set to the IP address or the host name of MySQL for Ranger.

    db_host=indigo0
    
  • RANGER_ADMIN_LOG_DIR specifies the directory for logging. By default, Ranger uses a local directory mounted with an emptyDir volume.

    RANGER_ADMIN_LOG_DIR=/opt/mr3-run/ranger/work-local-dir/log/ranger-admin
    

ranger-admin-site.xml

  • The configuration key ranger.jpa.jdbc.url should be set to the IP address of MySQL for Ranger.

      <property>
        <name>ranger.jpa.jdbc.url</name>
        <value>jdbc:log4jdbc:mysql://indigo0/ranger</value>
      </property>
    
  • The configuration keys ranger.externalurl and ranger.audit.solr.urls should use the host name assigned to Ranger. The port numbers should match the fields targetPort, not port, in yaml/ranger-service.yaml.

      <property>
        <name>ranger.externalurl</name>
        <value>http://indigo20:6080</value>
      </property>
      <property>
        <name>ranger.audit.solr.urls</name>
        <value>http://indigo20:6083/solr/ranger_audits</value>
      </property>
    
  • The configuration key ranger.admin.kerberos.cookie.domain should be set to the host running Ranger inside the Kubernetes cluster.

      <property>
        <name>ranger.admin.kerberos.cookie.domain</name>
        <value>indigo20</value>
      </property>
    
  • The configuration keys ranger.admin.kerberos.principal, ranger.spnego.kerberos.principal, and ranger.lookup.kerberos.principal should use service principals whose keytab files are in the directory kubernetes/ranger-key.

      <property>
        <name>ranger.admin.kerberos.principal</name>
        <value>rangeradmin/indigo20@RED</value>
      </property>
      <property>
        <name>ranger.admin.kerberos.keytab</name>
        <value>/opt/mr3-run/ranger/key/rangeradmin.keytab</value>
      </property>
    
      <property>
        <name>ranger.spnego.kerberos.principal</name>
        <value>HTTP/indigo20@RED</value>
      </property>
      <property>
        <name>ranger.spnego.kerberos.keytab</name>
        <value>/opt/mr3-run/ranger/key/spnego.service.keytab</value>
      </property>
    
      <property>
        <name>ranger.lookup.kerberos.principal</name>
        <value>rangerlookup/indigo20@RED</value>
      </property>
      <property>
        <name>ranger.lookup.kerberos.keytab</name>
        <value>/opt/mr3-run/ranger/key/rangerlookup.keytab</value>
      </property>
    
  • The configuration keys xasecure.audit.jaas.Client.option.keyTab and xasecure.audit.jaas.Client.option.principal should match the service principal specified by ranger.admin.kerberos.principal.

      <property>
        <name>xasecure.audit.jaas.Client.option.keyTab</name>
        <value>/opt/mr3-run/key/rangeradmin.keytab</value>
      </property>
    
      <property>
        <name>xasecure.audit.jaas.Client.option.principal</name>
        <value>rangeradmin/orange1@PL</value>
      </property>
    

solr.in.sh

The environment variable SOLR_AUTHENTICATION_OPTS should use the host running Ranger, the service principal, and service keytab corresponding to the configuration key ranger.spnego.kerberos.principal in ranger-admin-site.xml.

SOLR_AUTHENTICATION_OPTS="\
...
-Dsolr.kerberos.cookie.domain=indigo20 \
-Dsolr.kerberos.principal=HTTP/indigo20@RED \
-Dsolr.kerberos.keytab=/opt/mr3-run/ranger/key/spnego.service.keytab"

solr-security.json

The user-role section should specify service principals for HiveServer2 and Ranger.

    "user-role": {
      "hive/red0@RED": "updater",
      "rangeradmin/indigo20@RED": "reader",
    }

krb5.conf

This file should contains the information for Kerberos configuration. Usually it suffices to use a copy of kubernetes/conf/krb5.conf. The user may have to comment out the variable renew_lifetime if an error KrbApErrException: Message stream modified occurs (as explained in Accessing Secure HDFS).

Running Ranger

In order to run Ranger, the user can execute the script kubernetes/run-ranger.sh:

$ kubernetes/run-ranger.sh 
namespace/hivemr3 created
persistentvolume/workdir-pv-ranger created
persistentvolumeclaim/workdir-pvc-ranger created
configmap/hivemr3-ranger-conf-configmap created
secret/hivemr3-ranger-secret created
replicationcontroller/hivemr3-ranger created
service/ranger created

Executing the script kubernetes/run-ranger.sh starts a Ranger Pod (consisting of two containers) in a moment:

$ kubectl get -n hivemr3 pods
NAME                   READY   STATUS    RESTARTS   AGE
hivemr3-ranger-2q9wv   2/2     Running   0          33s

The user can check the log of the Ranger Pod to see if Ranger has successfully started:

$ kubectl logs -n hivemr3 hivemr3-ranger-2q9wv ranger
2020-10-09 07:19:37,935   --------- Running Ranger PolicyManager Web Application Install Script --------- 
2020-10-09 07:19:37,942  [I] uname=Linux
2020-10-09 07:19:37,947  [I] hostname=hivemr3-ranger-2q9wv
2020-10-09 07:19:37,961  [I] DB_FLAVOR=MYSQL
...
Installation of Ranger PolicyManager Web Application is completed.
Starting Apache Ranger Admin Service
Apache Ranger Admin Service with pid 1661 has started.

Then the user can connect to the Ranger webpage at the address specified by the configuration key ranger.externalurl. The default ID/password is admin/rangeradmin1 (where the password is given in kubernetes/ranger-key/install.properties):

rangerAdmin_password=rangeradmin1

After starting Ranger, connect to the Ranger webpage and create a Ranger service specified in kubernetes/conf/ranger-hive-security.xml. Then fill the JDBC URL (e.g., jdbc:hive2://indigo20:9852/;principal=hive/red0@RED;) and set policy.download.auth.users to the user hive, or the owner of HiveServer2. In this way, Ranger can inspect metadata (such as databases, tables, users) managed by HiveServer2 while HiveServer2 can retrieve its Ranger service profile.

/k8s/ranger.configure

Reconfiguring and running HiveServer2 after starting Ranger

After starting Ranger, HiveServer2 should be reconfigured so as to communicate with Ranger for checking data security. If the Ranger plugin is missing in the directory kubernetes/hive/hive/apache-hive/lib, the user should rebuild the Docker image for running Hive on MR3 before restarting HiveServer2. (The MR3 release already includes the Ranger plugin.)

kubernetes/yaml/hive.yaml

The spec/hostAliases field should include the host running Ranger inside the Kubernetes cluster.

kubernetes/conf/hive-site.xml

The following configuration keys should be set:

<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory</value>
</property>

<property>
  <name>hive.security.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value>
</property>

The user may choose to set hive.server2.enable.doAs to true because enabling impersonation is orthogonal to using Ranger.

kubernetes/conf/ranger-hive-security.xml

The configuration key ranger.plugin.hive.policy.rest.url should use the host name assigned to Ranger. Note that the port number should match the field port, not targetPort, in yaml/ranger-service.yaml because HiveServer2 connects via the Service that exposes Ranger to the outside of the Kubernetes cluster.

  <property>
    <name>ranger.plugin.hive.policy.rest.url</name>
    <value>http://indigo20:6080</value>
  </property>

The configuration key ranger.plugin.hive.service.name should use the Ranger service for HiveServer2:

  <property>
    <name>ranger.plugin.hive.service.name</name>
    <value>INDIGO_hive</value>
  </property>

kubernetes/conf/ranger-hive-audit.xml

The configuration key xasecure.audit.destination.solr.urls should use the host name assigned to Ranger.

  <property>
    <name>xasecure.audit.destination.solr.urls</name>
    <value>http://indigo20:6083/solr/ranger_audits</value>
  </property>

Running Ranger without Kerberos

In order to run Ranger without Kerberos, the user should take the following additional steps.

kubernetes/ranger-conf/core-site.xml

Set the configuration key hadoop.security.authentication to simple to disable Kerberos authentication.

  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>

kubernetes/ranger-key/install.properties

Set the variable audit_solr_urls to the address for the configuration key ranger.audit.solr.urls.

audit_solr_urls=http://indigo20:6083/solr/ranger_audits

Remove (do not just set to empty) two variables related to authentication for auditing:

  • audit_solr_user
  • audit_solr_password

kubernetes/ranger-conf/ranger-admin-site.xml

Remove configuration keys related to Kerberos:

  • xasecure.audit.jaas.Client.loginModuleName
  • xasecure.audit.jaas.Client.loginModuleControlFlag
  • xasecure.audit.jaas.Client.option.useTicketCache
  • xasecure.audit.jaas.Client.option.useKeyTab
  • xasecure.audit.jaas.Client.option.keyTab
  • xasecure.audit.jaas.Client.option.storeKey
  • xasecure.audit.jaas.Client.option.principal

kubernetes/ranger-conf/solr.in.sh

Set the following two variables:

SOLR_AUTH_TYPE="basic"
SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:solrRocks"

kubernetes/ranger-conf/solr-security.json

Set the configuration for authenticaion and authorization in Solr as follows:

{
  "authentication": {
    "blockUnknown": false,
    "class": "solr.BasicAuthPlugin",
    "credentials":{
      "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
    }
  },
  "authorization": {
    "class": "solr.RuleBasedAuthorizationPlugin"
  }
}

Since authentication/blockUnknown is set to false, Solr accepts audit requests without credentials. (Ranger does not use the credentials which correspond to user solr and password solrRocks.)

Troubleshooting

1. HiveServer2 throws NullPointerException when trying to download Ranger policies.

2020-10-08T12:23:08,872 ERROR [Thread-6] util.PolicyRefresher: PolicyRefresher(serviceName=ORANGE_hive): failed to refresh policies. Will continue to use last known version of policies (-1)
com.sun.jersey.api.client.ClientHandlerException: java.lang.RuntimeException: java.lang.NullPointerException
...
Caused by: java.lang.NullPointerException

The error disappears after setting policy.download.auth.users to include the user of HiveServer2 in the Config Properties panel.

ranger.config.panel

2. Test Connection fails in the Config Properties panel.

Check if the jdbc.url field is set properly. Examples:

  • jdbc:hive2://indigo20:9852/ when neither Kerberos nor SSL is used.
  • jdbc:hive2://indigo20:9852/;principal=hive/indigo20@RED; when Kerberos is used.
  • jdbc:hive2://indigo20:9852/;principal=hive/indigo20@RED;ssl=true;sslTrustStore=/opt/mr3-run/ranger/key/hivemr3-ssl-certificate.jks; when both Kerberos and SSL are used.