Setting env.sh
Below we provide details about the environment variables in env.sh
.
Common to all Pods
vi env.sh
MR3_NAMESPACE=hivemr3
MR3_SERVICE_ACCOUNT=hive-service-account
CONF_DIR_CONFIGMAP=hivemr3-conf-configmap
MASTER_SERVICE_ACCOUNT=master-service-account
WORKER_SERVICE_ACCOUNT=worker-service-account
CREATE_KEYTAB_SECRET=true
KEYTAB_SECRET=hivemr3-keytab-secret
CREATE_WORKER_SECRET=true
WORKER_SECRET=hivemr3-worker-secret
General
MR3_NAMESPACE
specifies the namespace for all Kubernetes objects.MR3_SERVICE_ACCOUNT
specifies the ServiceAccount for Hive on MR3.CONF_DIR_CONFIGMAP
specifies the name of the ConfigMap to be built from files in the directorykubernetes/conf
.
ServiceAccount
MASTER_SERVICE_ACCOUNT
specifies the ServiceAccount for MR3 DAGAppMaster.WORKER_SERVICE_ACCOUNT
specifies the ServiceAccount for MR3 ContainerWorkers.
Secret
CREATE_KEYTAB_SECRET
specifies whether or not to create a Secret from files in the directorykubernetes/key
. It should be set to true if Kerberos is used for authentication.KEYTAB_SECRET
specifies the name of the Secret to be built whenCREATE_KEYTAB_SECRET
is set to true.CREATE_WORKER_SECRET
specifies whether or not to create a Secret for MR3 ContainerWorkers.WORKER_SECRET
specifies the name of the Secret to be built whenCREATE_WORKER_SECRET
is set to true.
For Metastore
vi env.sh
HIVE_DATABASE_HOST=red0
HIVE_METASTORE_HOST=hivemr3-metastore-0.metastore.hivemr3.svc.cluster.local
HIVE_METASTORE_PORT=9850
HIVE_DATABASE_NAME=hivemr3
HIVE_WAREHOUSE_DIR=/opt/mr3-run/work-dir/warehouse/
METASTORE_SECURE_MODE=true
HIVE_METASTORE_KERBEROS_PRINCIPAL=hive/indigo20@RED
HIVE_METASTORE_KERBEROS_KEYTAB=$KEYTAB_MOUNT_DIR/hive.service.keytab
General
HIVE_DATABASE_HOST
specifies the host where the database server for Metastore is running.HIVE_METASTORE_HOST
andHIVE_METASTORE_PORT
specify the address of Metastore itself. As we want to create a Metastore Pod, setHIVE_METASTORE_HOST
tohivemr3-metastore-0.metastore.hivemr3.svc.cluster.local
. Herehivemr3-metastore-0
is the unique name of the Pod that will be running Metastore, andhivemr3
is the namespace. In order to use an existing Metastore running as an external component (without creating a new Metastore Pod), setHIVE_METASTORE_HOST
to its host (e.g.,red0
).HIVE_DATABASE_NAME
specifies the database name for Metastore in the MySQL server.
Warehouse
HIVE_WAREHOUSE_DIR
specifies the path to the Hive warehouse. Since MR3 is agnostic to the type of data sources, it is important to specify the full path to the warehouse, including the file system. If no file system is given, MR3 assumes the local file system because the configuration keyfs.defaultFS
is set tofile:///
inconf/core-site.xml
.
Since the date warehouse is shared by all the components of Hive on MR3,
its path should be globally valid in every Pod.
For example, HIVE_WAREHOUSE_DIR=hdfs://red0:8020/tmp/hive
is okay because it points to a globally valid location (directory /tmp/hive
on HDFS running on red0
).
If not, the user may not be able to create new databases or tables.
For example, if we set HIVE_WAREHOUSE_DIR
to /foo/bar
where Metastore has no write permission inside its Pod, the user cannot create new databases or tables.
If Metastore happens to have write permission on /foo/bar
,
the user can create new databases and tables.
Below are a few examples of the path.
For running Hive on MR3 in a Kubernetes cluster,
the user should use either hdfs
or s3a
for the file system.
/opt/mr3-run/work-dir/warehouse/
: A local directory inside the HiveServer2 Pod is used for the Hive warehouse. Since the local directory is not visible to the outside, this works only if all the components (HiveServer2, DAGAppMaster, and ContainerWorkers) run in the same Pod.hdfs://red0:8020/user/hive/warehouse
: An HDFS directory with NameNode onred0
is used for the Hive warehouse.s3a://mr3-bucket/warehouse
: An S3 bucket is used for the Hive warehouse.
When initializing schema,
Metastore reads the environment variable HIVE_WAREHOUSE_DIR
in env.sh
and stores the path to the data warehouse in the MySQL database.
Once the path to the data warehouse is registered in Metastore, the user can update it only by directly accessing the MySQL database.
Hence setting HIVE_WAREHOUSE_DIR
to a new path and restarting HiveServer2 has no effect.
Security
- If Metastore uses Kerberos authentication and runs in a secure mode,
METASTORE_SECURE_MODE
should be set to true. If HiveServer2 uses Kerberos authentication,METASTORE_SECURE_MODE
should also be set to true. HIVE_METASTORE_KERBEROS_PRINCIPAL
specifies the service principal, andHIVE_METASTORE_KERBEROS_KEYTAB
specifies the name of the service keytab file which should be copied to the directorykubernetes/key
by the user.
If HIVE_DATABASE_HOST
and HIVE_METASTORE_HOST
use hosts unknown to the default DNS,
the user should add their aliases in the field spec.template.spec.hostAliases
of
yaml/metastore.yaml
and yaml/hive.yaml
.
The following example adds host names red0
and indigo20
that are unknown to the default DNS.
vi yaml/hive.yaml
spec:
template:
spec:
hostAliases:
- ip: "10.1.91.4"
hostnames:
- "red0"
- ip: "10.1.91.41"
hostnames:
- "indigo20"
For HiveServer2
vi env.sh
HIVE_SERVER2_HOST=$HOSTNAME
HIVE_SERVER2_PORT=9852
HIVE_SERVER2_HTTP_PORT=10001
HIVE_SERVER2_HEAPSIZE=32768
HIVE_SERVER2_AUTHENTICATION=KERBEROS
HIVE_SERVER2_KERBEROS_PRINCIPAL=hive/indigo20@RED
HIVE_SERVER2_KERBEROS_KEYTAB=$KEYTAB_MOUNT_DIR/hive.service.keytab
TOKEN_RENEWAL_HIVE_ENABLED=false
General
HIVE_SERVER2_PORT
andHIVE_SERVER2_HTTP_PORT
should match the port numbers specified inyaml/hiveserver2-service.yaml
.HIVE_SERVER2_HEAPSIZE
specifies the heap size (in MB) for HiveServer2. If DAGAppMaster runs in LocalThread mode, the heap size should be no larger than the memory allocated to the Pod for running HiveServer2 (specified inhive.yaml
). If DAGAppMaster runs in LocalProcess mode, the sum with the heap size of DAGAppMaster (specified bymr3.am.resource.memory.mb
inconf/mr3-site.xml
) should be no larger than the memory allocated to the Pod.
Security
- If HiveServer2 uses Kerberos authentication
with
HIVE_SERVER2_AUTHENTICATION
set toKERBEROS
,HIVE_SERVER2_KERBEROS_PRINCIPAL
andHIVE_SERVER2_KERBEROS_KEYTAB
should specify the service principal and the service keytab file (forhive.server2.authentication.kerberos.principal
andhive.server2.authentication.kerberos.keytab
inhive-site.xml
), respectively. Note that the service name of this principal may be different from the service name inHIVE_METASTORE_KERBEROS_PRINCIPAL
, and the service keytab file may be different from the file inHIVE_METASTORE_KERBEROS_KEYTAB
. TOKEN_RENEWAL_HIVE_ENABLED
should be set to true in order to automatically renew Hive tokens.