Running as a Non-root User |

From MR3 release 1.4, Hive on MR3 runs all containers as a non-root user hive with UID 1000. In order to use a different UID, the user should rebuild a Docker image.

Hive on MR3 can run all containers as a non-root user, which comes with the following pros and cons.

(+) Running as a non-root user provides a strong layer of security, especially when users are permitted to call user defined functions (UDFs).
(-) On Amazon EKS, we cannot use PersistentVolumes specified by the configuration key mr3.k8s.worker.local.dir.persistentvolumes.
(-) On Amazon EKS, we cannot use IAM roles for ServiceAccounts (on Kubernetes 1.18 and earlier).

Below we explain how to run all containers as a non-root user. It involves three steps: 1) creating a Docker image for a non-root user; 2) updating kubernetes/env.sh; 3) updating the ownership or permission of hostPath volumes. In our example, we create a non-root user hive with UID 1000 and assume that a user with UID 1000 already exists on every host node where ContainerWorker Pods are to be run.

Creating a Docker image for a non-root user

Extend kubernetes/hive/Dockerfile by adding a new user hive with UID 1000, and build a new Docker image. Now all containers run as the non-root user hive.

$ vi kubernetes/hive/Dockerfile

ARG UID=1000
RUN adduser --no-create-home --disabled-login --gecos "" --uid $UID hive && \
    chown hive /opt/mr3-run/scratch-dir && \
    chown hive /opt/mr3-run/work-dir && \
    chown hive /opt/mr3-run/work-local-dir && \
    chown hive /opt/mr3-run/hive && \
    chown hive /opt/mr3-run/hive/tmp && \
    chown hive /opt/mr3-run/hive/run-result

USER hive

Updating `kubernetes/env.sh`

Set the environment variable DOCKER_USER to hive. If Kerberos-based authentication is enabled, the service principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL should match the user in DOCKER_USER (see Configuring HiveServer2 for more details).

$ vi kubernetes/env.sh

DOCKER_USER=hive

HIVE_SERVER2_KERBEROS_PRINCIPAL=hive/gold7@PL

Updating the ownership or permission of hostPath volumes

In order to be able to write intermediate data, a ContainerWorker process should have write permission on hostPath volumes mounted inside the ContainerWorker Pod. Hence the administrator user should set in advance the ownership or permission of those directories on every host node that are to be mapped to hostPath volumes (specified by the configuration key mr3.k8s.pod.worker.hostpaths), either manually or automatically (e.g., by exploiting the preBootstrapCommands field when creating an AWS EKS cluster with eksctl).

For those cases in which this requirement is hard to meet (e.g., when creating a Kubernetes cluster with kops on Amazon AWS), Hive on MR3 allows the user to update the ownership or permission at the time of creating ContainerWorker Pods. Specifically, a ContainerWorker Pod interprets a non-empty string specified by the configuration key mr3.k8s.pod.worker.init.container.command (in kubernetes/conf/mr3-site.xml) as a shell command and executes it in a privileged init container called init-command. Since the init container first mounts hostPath volumes specified by the configuration key mr3.k8s.pod.worker.hostpaths and then executes the shell command as a root user, we can update the ownership or permission of those directories for hostPath volumes. For example, if mr3.k8s.pod.worker.hostpaths is set to /data1/k8s, setting mr3.k8s.pod.worker.init.container.command to chown 1000:1000 /data1/k8s/ updates the ownership of the directory /data1/k8s/ when a ContainerWorker Pod starts.

In order to use init containers, the user should specify the Docker image for init containers with the configuration key mr3.k8s.pod.worker.init.container.image. Usually a small Docker image (such as busybox) works okay as long as it contains commands /bin/sh.

$ vi kubernetes/conf/core-site.xml

<property>
  <name>mr3.k8s.pod.worker.init.container.image</name>
  <value>busybox</value>
</property>

If mr3.k8s.pod.worker.init.container.command is set to empty, no init container is created.

Since multiple ContainerWorker Pods can start on the same host node, the shell command should not make destructive updates on hostPath volumes. For example, the user should not use a shell command deleting all the files on those directories for hostPath volumes, which may already be in use by other ContainerWorker Pods.

The user can concatenate multiple shell commands with ; (e.g., chown 1000:1000 /data1/k8s/; ls -alt /data1/k8s/). Since the name of the privileged init container is always init-command, the user can check the output of the shell command, e.g., by executing kubectl logs -n hivemr3 mr3worker-d9a6-15 -c init-command.

Using ports below 1024 for HiveServer2

If the HiveServer2 container runs a non-root user, it cannot open privileged Thrift ports below 1024 because the kernel parameter net.ipv4.ip_unprivileged_port_start is set to 1024 by default. In order to open Thrift ports below 1024, the user can add a SecurityContext resource so that the HiveServer2 container can start with the NET_BIND_SERVICE capability.

$ vi kubernetes/yaml/hive.yaml

spec:
  template:
    spec:
      containers:
      - image: 10.1.91.17:5000/hive3:latest
        securityContext:
          capabilities:
            add: 
            - NET_BIND_SERVICE

Unfortunately this approach does not work because of a bug in Kubernetes (https://github.com/kubernetes/kubernetes/issues/56374). For example, we find that the effective capability sets of the HiveServer2 process are all cleared.

hive@hivemr3-hiveserver2-clsdp:/opt/mr3-run/hive$ cat /proc/23/status | grep Cap
CapInh: 0000003fffffffff
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

This approach does not work even with suitable PodSecurityPolicy, ClusterRole, and ClusterRoleBinding resources.

As a workaround, we can exploit an init container that executes the sysctl command to set the kernel parameter net.ipv4.ip_unprivileged_port_start manually. In the following example, we update kubernetes/yaml/hive.yaml to create an init container init-command which executes the command sysctl net.ipv4.ip_unprivileged_port_start=0 as a root user.

$ vi kubernetes/yaml/hive.yaml

spec:
  template:
    spec:
      initContainers:
      - name: init-command
        image: 10.1.91.17:5000/hive3:latest
        args:
        - sysctl
        - net.ipv4.ip_unprivileged_port_start=0
        securityContext:
          privileged: true
          runAsUser: 0

Then the user can set the environment variable HIVE_SERVER2_PORT or HIVE_SERVER2_HTTP_PORT to a new value in kubernetes/env.sh, and update kubernetes/yaml/hive.yaml and kubernetes/yaml/hiveserver2-service.yaml accordingly.

An alternative approach is to open Thrift ports above 1024 but forward traffic from a privileged port by updating kubernetes/yaml/hiveserver2-service.yaml. For example, when HiveServer2 opens a Thrift port 10001, we can accept Beeline connections at port 443 as follows.

$ vi kubernetes/yaml/hiveserver2-service.yaml

spec:
  type: LoadBalancer
  ports:
  - protocol: TCP
    port: 443
    targetPort: 10001

Creating a Docker image for a non-root user

Updating kubernetes/env.sh

Updating the ownership or permission of hostPath volumes

Using ports below 1024 for HiveServer2

Updating `kubernetes/env.sh`