The previous installation of Hive on MR3 on Kubernetes assumes that Metastore along with its MySQL database runs as an external component. Alternatively we can run Metastore as a Pod so that the administrator user needs to provide only a MySQL database for Metastore. The resultant Kubernetes cluster is depicted in the following diagram:

hive.k8s.metastore

Configuring the Metastore Pod

There are several yaml files that the user should update manually in accordance with kubernetes/env.sh as well as Kubernetes cluster settings.

metastore-service.yaml

This file creates a governing Service required for the StatefulSet for Metastore. The user should use the same port number specified by the environment variable HIVE_METASTORE_PORT in kubernetes/env.sh.

  ports:
  - name: tcp
    port: 9850

metastore-role.yaml

This file creates a Role for a Metastore Pod. The name of the Role resource (metastore-role) is read in run-metastore.sh, so there is no need to update this file.

metastore.yaml

This file creates a Pod for running Metastore. The user should update several sections in this file according to Kubernetes cluster settings.

In the spec/template/spec/containers section:

  • The image field should match the Docker image specified by DOCKER_HIVE_IMG in kubernetes/env.sh.

  • The resources/requests and resources/limits specify the resources to to be allocated to a Metastore Pod.

  • The ports/containerPort field should match the port number specified in metastore-service.yaml.

    spec:
      template:
        spec:
          containers:
          - image: 10.1.91.17:5000/hive5:latest
            resources:
              requests:
                cpu: 2
                memory: 16Gi
              limits:
                cpu: 2
                memory: 16Gi
            ports:
            - containerPort: 9850
              protocol: TCP
    

In the spec/template/spec/volumes section:

  • The configMap/name field under conf-k8s-volume should match the name specified by CONF_DIR_CONFIGMAP in kubernetes/env.sh.

  • The secret/secretName field under key-k8s-volume should match the name specified by KEYTAB_SECRET in kubernetes/env.sh.

    spec:
      template:
        spec:
          volumes:
          - name: conf-k8s-volume
            configMap:
              name: hivemr3-conf-configmap
          - name: key-k8s-volume
            secret:
              secretName: hivemr3-keytab-secret
    

In the spec/template/spec/hostAliases section:

  • HIVE_DATABASE_HOST in kubernetes/env.sh (not in env.sh) already specifies the host where the database for Metastore is running. If it uses a host unknown to the default DNS, the user should add its alias. The following example adds a host name red0 that is unknown to the default DNS.
    spec:
      template:
        spec:
          hostAliases:
          - ip: "10.1.91.4"
            hostnames:
            - "red0"
    

Connecting Metastore to an existing MySQL database

For using an existing MySQL database, we assume that the MySQL connector specified in HIVE_MYSQL_DRIVER in env.sh (not in kubernetes/env.sh) is compatible with the MySQL database before creating the Docker image. If HIVE_MYSQL_DRIVER is not set in env.sh, the user should manually place a MySQL connector jar file inside the Metastore Pod using either a PersistentVolume or a hostPath volume. For details, see Troubleshooting.

Then update HIVE_METASTORE_HOST in kubernetes/env.sh:

HIVE_METASTORE_HOST=hivemr3-metastore-0.metastore.hivemr3.svc.cluster.local

Here hivemr3-metastore-0 is the unique name of the Pod that will be running Metastore, and hivemr3 is the namespace.

Next update kubernetes/conf/hive-site.xml as necessary to configure Metastore. In particular, check the following configuration keys relevant to security:

  • hive.metastore.pre.event.listeners or metastore.pre.event.listeners
  • hive.security.metastore.authenticator.manager
  • hive.security.metastore.authorization.manager
  • hive.security.metastore.authorization.auth.reads
  • hive.server2.enable.doAs

Make sure that the configuration key javax.jdo.option.ConnectionDriverName matches the MySQL connection jar file by setting it to either com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver.

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <!-- <value>com.mysql.jdbc.Driver</value> -->
  <value>com.mysql.cj.jdbc.Driver</value>
</property>

In order to run Metastore, the user can execute the script kubernetes/run-metastore.sh:

$ kubernetes/run-metastore.sh 
...
service/metastore created

Executing the script kubernetes/run-metastore.sh starts a Metastore Pod (of the unique name hivemr3-metastore-0) in a moment:

$ kubectl get -n hivemr3 pods
NAME                  READY   STATUS    RESTARTS   AGE
hivemr3-metastore-0   1/1     Running   0          37s

Now the user can run HiveServer2 by executing the script kubernetes/run-hive.sh.

Initializing schema

In order to initialize schema when starting Metastore, update kubernetes/yaml/metastore.yaml as follows:

        args: ["start", "--init-schema"]

After starting Metastore by executing kubernetes/run-metastore.sh, the user may want to restore kubernetes/yaml/metastore.yaml so as not to inadvertently initialize schema again:

        args: ["start"]

When initializing schema, Metastore reads the environment variable HIVE_WAREHOUSE_DIR in mr3-run/kubernetes/env.sh and stores the path to the data warehouse in the MySQL database. Once the path to the data warehouse is registered in Metastore, the user can update it only by directly accessing the MySQL database. Hence setting HIVE_WAREHOUSE_DIR to a new path and running HiveServer2 has no effect.

Since the date warehouse is shared by all the components of Hive on MR3, its path should be globally valid in every Pod. For example, HIVE_WAREHOUSE_DIR=hdfs://red0:8020/tmp/hive is okay because it points to a globally valid location (namely, directory /tmp/hive on HDFS running on red0). If not, the user may not be able to create new databases or tables depending on whether or not Metastore can create new directories under the path. For example, if we set HIVE_WAREHOUSE_DIR to /foo/bar where Metastore has no write permission inside its Pod, the user cannot create new databases or tables. If Metastore happens to have write permission on /foo/bar, the user can create new databases and tables.

Stopping Metastore

In order to stop Metastore, delete StatefulSet for Metastore.

$ kubectl -n hivemr3 delete statefulset hivemr3-metastore