From MR3 release 1.4, Hive on MR3 runs all containers as a non-root user hive
with UID 1000.
In order to use a different UID, the user should rebuild a Docker image.
Hive on MR3 can run all containers as a non-root user, which comes with the following pros and cons.
- (+) Running as a non-root user provides a strong layer of security, especially when users are permitted to call user defined functions (UDFs).
- (-) On Amazon EKS, we cannot use PersistentVolumes specified by the configuration key
mr3.k8s.worker.local.dir.persistentvolumes
. - (-) On Amazon EKS, we cannot use IAM roles for ServiceAccounts (on Kubernetes 1.18 and earlier).
Below we explain how to run all containers as a non-root user.
It involves three steps: 1) creating a Docker image for a non-root user;
2) updating kubernetes/env.sh
;
3) updating the ownership or permission of hostPath volumes.
In our example, we create a non-root user hive
with UID 1000
and assume that a user with UID 1000 already exists on every host node where ContainerWorker Pods are to be run.
Creating a Docker image for a non-root user
Extend kubernetes/hive/Dockerfile
by adding a new user hive
with UID 1000, and build a new Docker image.
Now all containers run as the non-root user hive
.
$ vi kubernetes/hive/Dockerfile
ARG UID=1000
RUN adduser --no-create-home --disabled-login --gecos "" --uid $UID hive && \
chown hive /opt/mr3-run/scratch-dir && \
chown hive /opt/mr3-run/work-dir && \
chown hive /opt/mr3-run/work-local-dir && \
chown hive /opt/mr3-run/hive && \
chown hive /opt/mr3-run/hive/tmp && \
chown hive /opt/mr3-run/hive/run-result
USER hive
Updating kubernetes/env.sh
Set the environment variable DOCKER_USER
to hive
.
If Kerberos-based authentication is enabled,
the service principal name in HIVE_SERVER2_KERBEROS_PRINCIPAL
should match the user in DOCKER_USER
(see Configuring HiveServer2 for more details).
$ vi kubernetes/env.sh
DOCKER_USER=hive
HIVE_SERVER2_KERBEROS_PRINCIPAL=hive/gold7@PL
Updating the ownership or permission of hostPath volumes
In order to be able to write intermediate data,
a ContainerWorker process should have write permission on hostPath volumes mounted inside the ContainerWorker Pod.
Hence the administrator user should set in advance
the ownership or permission of those directories on every host node that are to be mapped to hostPath volumes
(specified by the configuration key mr3.k8s.pod.worker.hostpaths
),
either manually or automatically
(e.g., by exploiting the preBootstrapCommands
field when creating an AWS EKS cluster with eksctl
).
For those cases in which this requirement is hard to meet (e.g., when creating a Kubernetes cluster with kops
on Amazon AWS),
Hive on MR3 allows the user to update the ownership or permission at the time of creating ContainerWorker Pods.
Specifically,
a ContainerWorker Pod interprets a non-empty string
specified by the configuration key mr3.k8s.pod.worker.init.container.command
(in kubernetes/conf/mr3-site.xml
) as a shell command
and executes it in a privileged init container called init-command
.
Since the init container first mounts hostPath volumes specified by the configuration key mr3.k8s.pod.worker.hostpaths
and then executes the shell command as a root user,
we can update the ownership or permission of those directories for hostPath volumes.
For example,
if mr3.k8s.pod.worker.hostpaths
is set to /data1/k8s
,
setting mr3.k8s.pod.worker.init.container.command
to chown 1000:1000 /data1/k8s/
updates the ownership of the directory /data1/k8s/
when a ContainerWorker Pod starts.
In order to use init containers,
the user should specify the Docker image for init containers
with the configuration key mr3.k8s.pod.worker.init.container.image
.
Usually a small Docker image (such as busybox
) works okay as long as it contains commands /bin/sh
.
$ vi kubernetes/conf/core-site.xml
<property>
<name>mr3.k8s.pod.worker.init.container.image</name>
<value>busybox</value>
</property>
If mr3.k8s.pod.worker.init.container.command
is set to empty, no init container is created.
Since multiple ContainerWorker Pods can start on the same host node, the shell command should not make destructive updates on hostPath volumes. For example, the user should not use a shell command deleting all the files on those directories for hostPath volumes, which may already be in use by other ContainerWorker Pods.
The user can concatenate multiple shell commands with ;
(e.g., chown 1000:1000 /data1/k8s/; ls -alt /data1/k8s/
).
Since the name of the privileged init container is always init-command
,
the user can check the output of the shell command, e.g., by executing kubectl logs -n hivemr3 mr3worker-d9a6-15 -c init-command
.
Using ports below 1024 for HiveServer2
If the HiveServer2 container runs a non-root user, it cannot open privileged Thrift ports below 1024
because the kernel parameter net.ipv4.ip_unprivileged_port_start
is set to 1024 by default.
In order to open Thrift ports below 1024,
the user can add a SecurityContext resource so that the HiveServer2 container can start with the NET_BIND_SERVICE
capability.
$ vi kubernetes/yaml/hive.yaml
spec:
template:
spec:
containers:
- image: 10.1.91.17:5000/hive3:latest
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
Unfortunately this approach does not work because of a bug in Kubernetes (https://github.com/kubernetes/kubernetes/issues/56374). For example, we find that the effective capability sets of the HiveServer2 process are all cleared.
hive@hivemr3-hiveserver2-clsdp:/opt/mr3-run/hive$ cat /proc/23/status | grep Cap
CapInh: 0000003fffffffff
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
This approach does not work even with suitable PodSecurityPolicy, ClusterRole, and ClusterRoleBinding resources.
As a workaround, we can exploit an init container that executes the sysctl
command
to set the kernel parameter net.ipv4.ip_unprivileged_port_start
manually.
In the following example,
we update kubernetes/yaml/hive.yaml
to
create an init container init-command
which executes the command sysctl net.ipv4.ip_unprivileged_port_start=0
as a root user.
$ vi kubernetes/yaml/hive.yaml
spec:
template:
spec:
initContainers:
- name: init-command
image: 10.1.91.17:5000/hive3:latest
args:
- sysctl
- net.ipv4.ip_unprivileged_port_start=0
securityContext:
privileged: true
runAsUser: 0
Then the user can set the environment variable HIVE_SERVER2_PORT
or HIVE_SERVER2_HTTP_PORT
to a new value in kubernetes/env.sh
,
and update kubernetes/yaml/hive.yaml
and kubernetes/yaml/hiveserver2-service.yaml
accordingly.
An alternative approach is to open Thrift ports above 1024 but forward traffic from a privileged port by updating kubernetes/yaml/hiveserver2-service.yaml
.
For example, when HiveServer2 opens a Thrift port 10001, we can accept Beeline connections at port 443 as follows.
$ vi kubernetes/yaml/hiveserver2-service.yaml
spec:
type: LoadBalancer
ports:
- protocol: TCP
port: 443
targetPort: 10001