We run Metastore as a new Pod, so the administrator user needs to provide a MySQL database for Metastore. The following steps are unnecessary if we use an existing Metastore running as an external component.

Configuring the Metastore Pod

There are several yaml files that the user should update manually in accordance with kubernetes/env.sh as well as Kubernetes cluster settings.

metastore-service.yaml

This file creates a governing Service required for the StatefulSet for Metastore. The user should use the same port number specified by the environment variable HIVE_METASTORE_PORT in kubernetes/env.sh.

$ vi kubernetes/yaml/metastore-service.yaml

  ports:
  - name: tcp
    port: 9850

metastore-role.yaml

This file creates a Role for a Metastore Pod. The name of the Role resource (metastore-role) is read in run-metastore.sh, so there is no need to update this file.

metastore.yaml

This file creates a Pod for running Metastore. The user should update several sections in this file according to Kubernetes cluster settings.

In the spec/template/spec/containers section:

  • The image field should match the Docker image specified by DOCKER_HIVE_IMG in kubernetes/env.sh.

  • The resources/requests and resources/limits specify the resources to to be allocated to a Metastore Pod.

  • The ports/containerPort field should match the port number specified in metastore-service.yaml.

    $ vi kubernetes/yaml/metastore.yaml
    
    spec:
      template:
        spec:
          containers:
          - image: 10.1.91.17:5000/hive3:latest
            resources:
              requests:
                cpu: 2
                memory: 16Gi
              limits:
                cpu: 2
                memory: 16Gi
            ports:
            - containerPort: 9850
              protocol: TCP
    

In the spec/template/spec/volumes section:

  • The configMap/name field under conf-k8s-volume should match the name specified by CONF_DIR_CONFIGMAP in kubernetes/env.sh.

  • The secret/secretName field under key-k8s-volume should match the name specified by KEYTAB_SECRET in kubernetes/env.sh.

    $ vi kubernetes/yaml/metastore.yaml
    
    spec:
      template:
        spec:
          volumes:
          - name: conf-k8s-volume
            configMap:
              name: hivemr3-conf-configmap
          - name: key-k8s-volume
            secret:
              secretName: hivemr3-keytab-secret
    

In the spec/template/spec/hostAliases section:

  • HIVE_DATABASE_HOST in kubernetes/env.sh (not in env.sh) already specifies the host where the database for Metastore is running. If it uses a host unknown to the default DNS, the user should add its alias. The following example adds host names red0 and indigo20 that are unknown to the default DNS.
    $ vi kubernetes/yaml/metastore.yaml
    
    spec:
      template:
        spec:
          hostAliases:
          - ip: "10.1.91.4"
            hostnames:
            - "red0"
          - ip: "10.1.91.41"
            hostnames:
            - "indigo20"
    

Using Kerberos-based authentication

In order to use Kerberos-based authentication, the configuration key hadoop.security.authentication should be set to kerberos in kubernetes/conf/core-site.xml.

$ vi kubernetes/conf/core-site.xml

<property>
  <name>hadoop.security.authentication</name>
  <value>kerberos</value>
</property>

Connecting Metastore to a database server

When using MySQL as a backing database, Metastore automatically downloads a MySQL connector from https://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.28.tar.gz. If a custom database connector should be used, the user should manually place a connector jar file inside the Metastore Pod using either a PersistentVolume or a hostPath volume. For details, see Troubleshooting.

Next update kubernetes/conf/hive-site.xml as necessary to configure Metastore. In particular, check the following configuration keys relevant to security:

  • hive.metastore.pre.event.listeners or metastore.pre.event.listeners
  • hive.security.metastore.authenticator.manager
  • hive.security.metastore.authorization.manager
  • hive.security.metastore.authorization.auth.reads
  • hive.server2.enable.doAs

Make sure that the configuration key javax.jdo.option.ConnectionDriverName matches the MySQL connection jar file by setting it to either com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver.

$ vi kubernetes/conf/hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <!-- <value>com.mysql.jdbc.Driver</value> -->
  <value>com.mysql.cj.jdbc.Driver</value>
</property>

Initializing schema

In order to initialize schema when starting Metastore, update kubernetes/yaml/metastore.yaml as follows:

        args: ["start", "--init-schema"]

After starting Metastore by executing kubernetes/run-metastore.sh, the user may want to restore kubernetes/yaml/metastore.yaml so as not to inadvertently initialize schema again:

        args: ["start"]

When initializing schema, Metastore reads the environment variable HIVE_WAREHOUSE_DIR in mr3-run/kubernetes/env.sh and stores the path to the data warehouse in the MySQL database. Once the path to the data warehouse is registered in Metastore, the user can update it only by directly accessing the MySQL database. Hence setting HIVE_WAREHOUSE_DIR to a new path and running HiveServer2 has no effect.

Since the date warehouse is shared by all the components of Hive on MR3, its path should be globally valid in every Pod. For example, HIVE_WAREHOUSE_DIR=hdfs://red0:8020/tmp/hive is okay because it points to a globally valid location (namely, directory /tmp/hive on HDFS running on red0). If not, the user may not be able to create new databases or tables depending on whether or not Metastore can create new directories under the path. For example, if we set HIVE_WAREHOUSE_DIR to /foo/bar where Metastore has no write permission inside its Pod, the user cannot create new databases or tables. If Metastore happens to have write permission on /foo/bar, the user can create new databases and tables.

Running Metastore

In order to run Metastore, the user can execute the script kubernetes/run-metastore.sh:

$ kubernetes/run-metastore.sh 
...
service/metastore created

Executing the script kubernetes/run-metastore.sh starts a Metastore Pod (of the unique name hivemr3-metastore-0) in a moment:

$ kubectl get -n hivemr3 pods
NAME                  READY   STATUS    RESTARTS   AGE
hivemr3-metastore-0   1/1     Running   0          37s

Stopping Metastore

In order to stop Metastore, delete StatefulSet for Metastore.

$ kubectl -n hivemr3 delete statefulset hivemr3-metastore