We run Metastore as a new Pod, so the administrator user needs to provide a MySQL database for Metastore. The following steps are unnecessary if we use an existing Metastore running as an external component.
Configuring the Metastore Pod
There are several yaml
files that the user should update manually in accordance with kubernetes/env.sh
as well as Kubernetes cluster settings.
metastore-service.yaml
This file creates a governing Service required for the StatefulSet for Metastore.
The user should use the same port number specified by the environment variable HIVE_METASTORE_PORT
in kubernetes/env.sh
.
$ vi kubernetes/yaml/metastore-service.yaml
ports:
- name: tcp
port: 9850
metastore-role.yaml
This file creates a Role for a Metastore Pod. The name of the Role resource (metastore-role) is read in run-metastore.sh
, so there is no need to update this file.
metastore.yaml
This file creates a Pod for running Metastore. The user should update several sections in this file according to Kubernetes cluster settings.
In the spec/template/spec/containers
section:
-
The
image
field should match the Docker image specified byDOCKER_HIVE_IMG
inkubernetes/env.sh
. -
The
resources/requests
andresources/limits
specify the resources to to be allocated to a Metastore Pod. -
The
ports/containerPort
field should match the port number specified inmetastore-service.yaml
.$ vi kubernetes/yaml/metastore.yaml spec: template: spec: containers: - image: 10.1.91.17:5000/hive3:latest resources: requests: cpu: 2 memory: 16Gi limits: cpu: 2 memory: 16Gi ports: - containerPort: 9850 protocol: TCP
In the spec/template/spec/volumes
section:
-
The
configMap/name
field underconf-k8s-volume
should match the name specified byCONF_DIR_CONFIGMAP
inkubernetes/env.sh
. -
The
secret/secretName
field underkey-k8s-volume
should match the name specified byKEYTAB_SECRET
inkubernetes/env.sh
.$ vi kubernetes/yaml/metastore.yaml spec: template: spec: volumes: - name: conf-k8s-volume configMap: name: hivemr3-conf-configmap - name: key-k8s-volume secret: secretName: hivemr3-keytab-secret
In the spec/template/spec/hostAliases
section:
HIVE_DATABASE_HOST
inkubernetes/env.sh
(not inenv.sh
) already specifies the host where the database for Metastore is running. If it uses a host unknown to the default DNS, the user should add its alias. The following example adds host namesred0
andindigo20
that are unknown to the default DNS.$ vi kubernetes/yaml/metastore.yaml spec: template: spec: hostAliases: - ip: "10.1.91.4" hostnames: - "red0" - ip: "10.1.91.41" hostnames: - "indigo20"
Using Kerberos-based authentication
In order to use Kerberos-based authentication,
the configuration key hadoop.security.authentication
should be set to kerberos
in kubernetes/conf/core-site.xml
.
$ vi kubernetes/conf/core-site.xml
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
Connecting Metastore to a database server
When using MySQL as a backing database,
Metastore automatically downloads a MySQL connector from
https://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.28.tar.gz
.
If a custom database connector should be used,
the user should manually place a connector jar file inside the Metastore Pod
using either a PersistentVolume or a hostPath volume.
For details, see Troubleshooting.
Next update kubernetes/conf/hive-site.xml
as necessary to configure Metastore.
In particular, check the following configuration keys relevant to security:
hive.metastore.pre.event.listeners
ormetastore.pre.event.listeners
hive.security.metastore.authenticator.manager
hive.security.metastore.authorization.manager
hive.security.metastore.authorization.auth.reads
hive.server2.enable.doAs
Make sure that the configuration key javax.jdo.option.ConnectionDriverName
matches the MySQL connection jar file
by setting it to either com.mysql.jdbc.Driver
or com.mysql.cj.jdbc.Driver
.
$ vi kubernetes/conf/hive-site.xml
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<!-- <value>com.mysql.jdbc.Driver</value> -->
<value>com.mysql.cj.jdbc.Driver</value>
</property>
Initializing schema
In order to initialize schema when starting Metastore, update kubernetes/yaml/metastore.yaml
as follows:
args: ["start", "--init-schema"]
After starting Metastore by executing kubernetes/run-metastore.sh
, the user may want to restore kubernetes/yaml/metastore.yaml
so as not to inadvertently initialize schema again:
args: ["start"]
When initializing schema,
Metastore reads the environment variable HIVE_WAREHOUSE_DIR
in mr3-run/kubernetes/env.sh
and stores the path to the data warehouse in the MySQL database.
Once the path to the data warehouse is registered in Metastore, the user can update it only by directly accessing the MySQL database.
Hence setting HIVE_WAREHOUSE_DIR
to a new path and running HiveServer2 has no effect.
Since the date warehouse is shared by all the components of Hive on MR3,
its path should be globally valid in every Pod.
For example, HIVE_WAREHOUSE_DIR=hdfs://red0:8020/tmp/hive
is okay because it points to a globally valid location (namely, directory /tmp/hive
on HDFS running on red0
).
If not, the user may not be able to create new databases or tables depending on whether or not Metastore can create new directories under the path.
For example, if we set HIVE_WAREHOUSE_DIR
to /foo/bar
where Metastore has no write permission inside its Pod, the user cannot create new databases or tables.
If Metastore happens to have write permission on /foo/bar
, the user can create new databases and tables.
Running Metastore
In order to run Metastore,
the user can execute the script kubernetes/run-metastore.sh
:
$ kubernetes/run-metastore.sh
...
service/metastore created
Executing the script kubernetes/run-metastore.sh
starts a Metastore Pod (of the unique name hivemr3-metastore-0
) in a moment:
$ kubectl get -n hivemr3 pods
NAME READY STATUS RESTARTS AGE
hivemr3-metastore-0 1/1 Running 0 37s
Stopping Metastore
In order to stop Metastore, delete StatefulSet for Metastore.
$ kubectl -n hivemr3 delete statefulset hivemr3-metastore